Smiley face

Umer Saeed Smiley face

Sr. RF Planning and Optimization Engineer
BSc Telecommunications Engineering, MS Data Science
F2017313014@umt.edu.pk
https://www.linkedin.com/in/engumersaeed/
https://github.com/umersaeed81h

14-Aug-2019

Smiley face Smiley face

Smiley face

Chapter 11: Data Frame in Python

11.1 What is Data Science?

  • Data Science or Data Analytics is a process of analyzing large set of data points to get answer on questions related to that data set.
  • Pandas is a python module that makes data science easy and effective

11.2 Data Frames

  • Dataframe is a main object in Pandas. It is used to represent data with rows and columnns (tabular or excel spreadsheed like data).

  • A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.

Following are the characteristics of a data frame:

  • There can be multiple rows and columns in the data.
  • Each row represents a sample of data,
  • Each column contains a different variable that describes the samples (rows).
  • The data in every column is usually the same type of data – e.g. numbers, strings, dates.
  • Unlike matrices, data frames can store different classes of objects in each column;matrices must have every element be the same class.

11.3 Library Highlights

  • A fast and efficient DataFrame object for data manipulation with integrated indexing.

  • Tools for reading and writing data between in-memory data structures and different formats:

    • CSV and text files.
    • Microsoft Excel.
    • SQL databases.
    • The fast HDF5 format.
    • etc.
  • Intelligent data alignment and integrated handling of missing data: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form.
  • Flexible reshaping and pivoting of data sets.
  • Intelligent label-based slicing, fancy indexing, and subsetting of large data sets.
  • Columns can be inserted and deleted from data structures for size mutability.
  • Aggregating or transforming data with a powerful group by engine allowing split-apply-combine operations on data sets.
  • High performance merging and joining of data sets.
  • Hierarchical axis indexing provides an intuitive way of working with high-dimensional data in a lower-dimensional data structure.
  • Time series-functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging. Even create domain-specific time offsets and join time series without losing data
  • Highly optimized for performance, with critical code paths written in Cython or C.
  • Python with pandas is in use in a wide variety of academic and commercial domains, including Finance, Neuroscience, Economics, Statistics, Advertising, Web Analytics, and more.

11.4 Pandas Documentation

In [1]:
from IPython.display import YouTubeVideo
YouTubeVideo('CmorAWRsCAw',width=900, height=500)
Out[1]:
In [2]:
from IPython.display import YouTubeVideo
YouTubeVideo('F6kmIpWWEdU',width=900, height=500)
Out[2]:

11.5 Import Python Library

In [3]:
import pandas as pd
import numpy as np
import os
import glob
from glob import glob

11.6 Pandas Version

In [4]:
pd.__version__
Out[4]:
'0.25.0'
  • But if you also need to know the versions of pandas' dependencies, you can use the show_versions() function:
In [5]:
pd.show_versions()
INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.3.final.0
python-bits      : 64
OS               : Windows
OS-release       : 7
machine          : AMD64
processor        : Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : None.None

pandas           : 0.25.0
numpy            : 1.16.4
pytz             : 2019.1
dateutil         : 2.8.0
pip              : 19.1.1
setuptools       : 41.0.1
Cython           : 0.29.12
pytest           : 5.0.1
hypothesis       : None
sphinx           : 2.1.2
blosc            : None
feather          : None
xlsxwriter       : 1.1.8
lxml.etree       : 4.3.4
html5lib         : 1.0.1
pymysql          : None
psycopg2         : None
jinja2           : 2.10.1
IPython          : 7.6.1
pandas_datareader: None
bs4              : 4.7.1
bottleneck       : 1.2.1
fastparquet      : None
gcsfs            : None
lxml.etree       : 4.3.4
matplotlib       : 3.1.0
numexpr          : 2.6.9
odfpy            : None
openpyxl         : 2.6.2
pandas_gbq       : None
pyarrow          : None
pytables         : None
s3fs             : None
scipy            : 1.2.1
sqlalchemy       : 1.3.5
tables           : 3.5.2
xarray           : None
xlrd             : 1.2.0
xlwt             : 1.3.0
xlsxwriter       : 1.1.8

11.7 Create Data Frame

In [6]:
from IPython.display import YouTubeVideo
YouTubeVideo('-Ov1N1_FbP8',width=900, height=500)
Out[6]:
In [7]:
from IPython.display import YouTubeVideo
YouTubeVideo('3k0HbcUGErE',width=900, height=500)
Out[7]:

Format-1

In [8]:
weather=pd.DataFrame({
'Day':['1/1/2019','1/2/2019','1/3/2019','1/4/2019'],
'Temperature':[30,32,34,36],
'Windspeed':[6,7,10,12],
'Event':['Rain','Sunny','Rain','Sunny']
})
weather
Out[8]:
Day Temperature Windspeed Event
0 1/1/2019 30 6 Rain
1 1/2/2019 32 7 Sunny
2 1/3/2019 34 10 Rain
3 1/4/2019 36 12 Sunny

Format-2 (from_dict)

In [9]:
weather=pd.DataFrame.from_dict({
'Day':['1/1/2019','1/2/2019','1/3/2019','1/4/2019'],
'Temperature':[30,32,34,36],
'Windspeed':[6,7,10,12],
'Event':['Rain','Sunny','Rain','Sunny']
})
weather
Out[9]:
Day Temperature Windspeed Event
0 1/1/2019 30 6 Rain
1 1/2/2019 32 7 Sunny
2 1/3/2019 34 10 Rain
3 1/4/2019 36 12 Sunny

2- Create Data Frame Using List of Tuple

Format-1

In [10]:
weather=pd.DataFrame([
('1/1/2019',30,6,'Rain'),
('1/2/2019',32,7,'Sunny'),
('1/3/2019',34,10,'Rain'),
('1/4/2019',36,12,'Sunny')],
    columns=["Day","Temperature","Windspeed","Event"])
weather
Out[10]:
Day Temperature Windspeed Event
0 1/1/2019 30 6 Rain
1 1/2/2019 32 7 Sunny
2 1/3/2019 34 10 Rain
3 1/4/2019 36 12 Sunny

Format-2 (from_records)

In [11]:
sales = pd.DataFrame.from_records([('Jones LLC', 150, 200, 50),
         ('Alpha Co', 200, 210, 90),
         ('Blue Inc', 140, 215, 95)],
        columns = ['account', 'Jan', 'Feb', 'Mar'])
sales
Out[11]:
account Jan Feb Mar
0 Jones LLC 150 200 50
1 Alpha Co 200 210 90
2 Blue Inc 140 215 95

3- Create Data Frame Using List of Dictionary

Format-1

In [12]:
weather=pd.DataFrame([
{'Day':'1/1/2019','Temperature':30,'Windspeed':6,'Event':'Rain'},
{'Day':'1/2/2019','Temperature':32,'Windspeed':7,'Event':'Sunny'},
{'Day':'1/3/2019','Temperature':34,'Windspeed':8,'Event':'Sunny'},
{'Day':'1/4/2019','Temperature':36,'Windspeed':8,'Event':'Rain'}
])
weather
Out[12]:
Day Temperature Windspeed Event
0 1/1/2019 30 6 Rain
1 1/2/2019 32 7 Sunny
2 1/3/2019 34 8 Sunny
3 1/4/2019 36 8 Rain

Format-2

In [13]:
sales = pd.DataFrame.from_dict(
        [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140},
         {'account': 'Alpha Co',  'Jan': 200, 'Feb': 210, 'Mar': 215},
         {'account': 'Blue Inc',  'Jan': 50,  'Feb': 90,  'Mar': 95 }]
)
sales
Out[13]:
account Jan Feb Mar
0 Jones LLC 150 200 140
1 Alpha Co 200 210 215
2 Blue Inc 50 90 95

Solution-1

In [14]:
sales = sales[['account', 'Jan', 'Feb', 'Mar']]
sales
Out[14]:
account Jan Feb Mar
0 Jones LLC 150 200 140
1 Alpha Co 200 210 215
2 Blue Inc 50 90 95

Solution-2 (from_dict)

  • Alternatively you could create your dictionary using python’s OrderedDict .
In [15]:
from collections import OrderedDict
sales =  pd.DataFrame.from_dict(
            OrderedDict([('account', ['Jones LLC', 'Alpha Co', 'Blue Inc']),
          ('Jan', [150, 200, 50]),
          ('Feb',  [200, 210, 90]),
          ('Mar', [140, 215, 95]) ])
           )
sales
Out[15]:
account Jan Feb Mar
0 Jones LLC 150 200 140
1 Alpha Co 200 210 215
2 Blue Inc 50 90 95

Solution-3 (from_items)

In [16]:
sales = pd.DataFrame.from_items(
        [('account', ['Jones LLC', 'Alpha Co', 'Blue Inc']),
         ('Jan', [150, 200, 50]),
         ('Feb', [200, 210, 90]),
         ('Mar', [140, 215, 95]),
         ])
sales
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:5: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), ...) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.
  """
Out[16]:
account Jan Feb Mar
0 Jones LLC 150 200 140
1 Alpha Co 200 210 215
2 Blue Inc 50 90 95

4- Create Data Frame Using Random Number

In [17]:
df= pd.DataFrame(np.random.rand(4, 8))
df
Out[17]:
0 1 2 3 4 5 6 7
0 0.180885 0.653903 0.131248 0.370825 0.798803 0.434655 0.592071 0.408316
1 0.902850 0.710026 0.113806 0.090753 0.012750 0.088624 0.439979 0.178760
2 0.431155 0.854234 0.241387 0.814954 0.578416 0.173259 0.438826 0.120701
3 0.458604 0.323771 0.464056 0.385634 0.029485 0.745091 0.185586 0.839838

5- Create a DataFrame from the clipboard

  • More Information
  • Let's say that you have some data stored in an Excel spreadsheet or a Google Sheet, and you want to get it into a DataFrame as quickly as possible.
  • Just select the data and copy it to the clipboard. Then, you can use the read_clipboard() function to read it into a DataFrame:
In [18]:
df = pd.read_clipboard()
df
Out[18]:
Data-Frame-in-Python
  • Keep in mind that if you want your work to be reproducible in the future, read_clipboard() is not the recommended approach.

empty data frame

  • Indicator whether DataFrame is empty.
  • True if DataFrame is entirely empty (no items), meaning any of the axes are of length 0.
  • More Information
In [19]:
df=pd.DataFrame({'A' : []})
df
Out[19]:
A
  • If we only have NaNs in our DataFrame, it is not considered empty! We will need to drop the NaNs to make the DataFrame empty
In [20]:
df = pd.DataFrame({'A' : [np.nan]})
df
Out[20]:
A
0 NaN
In [21]:
df=df.empty
df
Out[21]:
False

11.8 Import Data Frame in Pandas

In [22]:
from IPython.display import YouTubeVideo
YouTubeVideo('5_QXMwezPJE',width=900, height=500)
Out[22]:

1- Create Data Frame Using *.csv File

Example-1

In [23]:
df= pd.read_csv('Weather.csv')
df
Out[23]:
Weather Report Unnamed: 1 Unnamed: 2 Unnamed: 3
0 Date Temperature Windspeed Event
1 1/1/2019 30 6 Rain
2 1/2/2019 32 7 Sunny
3 1/3/2019 34 10 Rain
4 1/4/2019 36 12 Sunny

Solution-1::Skip Row

In [24]:
df= pd.read_csv('Weather.csv',skiprows=1)
df
Out[24]:
Date Temperature Windspeed Event
0 1/1/2019 30 6 Rain
1 1/2/2019 32 7 Sunny
2 1/3/2019 34 10 Rain
3 1/4/2019 36 12 Sunny

Solution-2::Define Header Row

In [25]:
df= pd.read_csv('Weather.csv',header=1)
df
Out[25]:
Date Temperature Windspeed Event
0 1/1/2019 30 6 Rain
1 1/2/2019 32 7 Sunny
2 1/3/2019 34 10 Rain
3 1/4/2019 36 12 Sunny

Example-2

In [26]:
df= pd.read_csv('Weather_header.csv')
df
Out[26]:
1/1/2019 30 6 Rain
0 1/2/2019 32 7 Sunny
1 1/3/2019 34 10 Rain
2 1/4/2019 36 12 Sunny

Solution-1::Header set to None

In [27]:
df= pd.read_csv('Weather_header.csv',header=None)
df
Out[27]:
0 1 2 3
0 1/1/2019 30 6 Rain
1 1/2/2019 32 7 Sunny
2 1/3/2019 34 10 Rain
3 1/4/2019 36 12 Sunny

Solution-2::Set Header Name

In [28]:
df= pd.read_csv('Weather_header.csv',
names=['Date','Temperature',
       'Windspeed','Event'])
df
Out[28]:
Date Temperature Windspeed Event
0 1/1/2019 30 6 Rain
1 1/2/2019 32 7 Sunny
2 1/3/2019 34 10 Rain
3 1/4/2019 36 12 Sunny

Set Header Name using list Method

In [29]:
df= pd.DataFrame(np.random.rand(4, 8),columns=list('abcdefgh'))
df
Out[29]:
a b c d e f g h
0 0.879158 0.339616 0.636412 0.635410 0.262065 0.522461 0.305388 0.964275
1 0.994960 0.727620 0.483729 0.697423 0.849967 0.704884 0.448210 0.555644
2 0.564142 0.683698 0.259491 0.383172 0.478696 0.801915 0.822479 0.391374
3 0.030349 0.140495 0.482342 0.010156 0.993936 0.507530 0.507909 0.062868

Example-3

Import Only Few Rows From *.csv File

In [30]:
df= pd.read_csv('Weather1.csv',nrows=2)
df
Out[30]:
Date Temperature Windspeed Event
0 1/1/2019 30 6 Rain
1 1/2/2019 32 7 Sunny

Import Only Few Colums Using Variable Name

In [31]:
df= pd.read_csv('Weather1.csv',usecols=['Date','Temperature'])
df
Out[31]:
Date Temperature
0 1/1/2019 30
1 1/2/2019 32
2 1/3/2019 34
3 1/4/2019 36

Import Only Few Colums Using Variable Index

In [32]:
df= pd.read_csv('Weather1.csv',usecols=[0,3])
df
Out[32]:
Date Event
0 1/1/2019 Rain
1 1/2/2019 Sunny
2 1/3/2019 Rain
3 1/4/2019 Sunny

Example 4

In [33]:
df= pd.read_csv('Family.csv')
df
Out[33]:
Name ID Program
0 Umer Saeed F2017313014 MS(DS)
1 Ali Saeed F2017313016 BBA
2 Ahmed Saeed F2017313018 MS(CS)
3 Saeed Family Data Base NaN NaN
In [34]:
df= pd.read_csv('Family.csv',skipfooter=1)
df
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.
  """Entry point for launching an IPython kernel.
Out[34]:
Name ID Program
0 Umer Saeed F2017313014 MS(DS)
1 Ali Saeed F2017313016 BBA
2 Ahmed Saeed F2017313018 MS(CS)
In [35]:
df= pd.read_csv('Family.csv',skipfooter=1,engine='python')
df
Out[35]:
Name ID Program
0 Umer Saeed F2017313014 MS(DS)
1 Ali Saeed F2017313016 BBA
2 Ahmed Saeed F2017313018 MS(CS)

Example 5

In [36]:
from IPython.display import YouTubeVideo
YouTubeVideo('0uBirYFhizE',width=900, height=500)
Out[36]:
In [37]:
df= pd.read_csv('check.csv')
df
Out[37]:
Name Unnamed: 1 Marks
0 Umer F201731304 50

Solution1::Assign Columns Name

In [38]:
df.columns = ['Name', 'Student ID','Marks']
df
Out[38]:
Name Student ID Marks
0 Umer F201731304 50

Solution2:: Rename

In [39]:
df= pd.read_csv('check.csv')
df.rename(columns={"Unnamed: 1":"Student ID"})
Out[39]:
Name Student ID Marks
0 Umer F201731304 50

Rename Column Name using Replace Method

In [40]:
df = pd.DataFrame({'col one':[100, 200], 'col two':[300, 400]})
df
Out[40]:
col one col two
0 100 300
1 200 400
In [41]:
df.columns = df.columns.str.replace(' ', '_')
df
Out[41]:
col_one col_two
0 100 300
1 200 400

Remame Column Name using add_prefix() Method

In [42]:
df = pd.DataFrame({'col one':[100, 200], 'col two':[300, 400]})
df.add_prefix('X_')
Out[42]:
X_col one X_col two
0 100 300
1 200 400

Remame Column Name using add_suffix() Method

In [43]:
df = pd.DataFrame({'col one':[100, 200], 'col two':[300, 400]})
df.add_suffix('_Y')
Out[43]:
col one_Y col two_Y
0 100 300
1 200 400

Import File from url

In [44]:
df= pd.read_html('https://github.com/justmarkham/pandas-videos/blob/master/data/titanic_test.csv')
In [45]:
type(df)
Out[45]:
list
In [46]:
df=df[0]
In [47]:
df.head()
Out[47]:
Unnamed: 0 PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 NaN 892 3 Kelly, Mr. James male 34.5 0 0 330911 7.8292 NaN Q
1 NaN 893 3 Wilkes, Mrs. James (Ellen Needs) female 47.0 1 0 363272 7.0000 NaN S
2 NaN 894 2 Myles, Mr. Thomas Francis male 62.0 0 0 240276 9.6875 NaN Q
3 NaN 895 3 Wirz, Mr. Albert male 27.0 0 0 315154 8.6625 NaN S
4 NaN 896 3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female 22.0 1 1 3101298 12.2875 NaN S
In [48]:
df=df.iloc[:,1:]
df.head()
Out[48]:
PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 892 3 Kelly, Mr. James male 34.5 0 0 330911 7.8292 NaN Q
1 893 3 Wilkes, Mrs. James (Ellen Needs) female 47.0 1 0 363272 7.0000 NaN S
2 894 2 Myles, Mr. Thomas Francis male 62.0 0 0 240276 9.6875 NaN Q
3 895 3 Wirz, Mr. Albert male 27.0 0 0 315154 8.6625 NaN S
4 896 3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female 22.0 1 1 3101298 12.2875 NaN S

2- Create Data Frame Using *.tsv File

In [49]:
df = pd.read_table('chipotle.tsv')
df.head()
Out[49]:
order_id quantity item_name choice_description item_price
0 1 1 Chips and Fresh Tomato Salsa NaN $2.39
1 1 1 Izze [Clementine] $3.39
2 1 1 Nantucket Nectar [Apple] $3.39
3 1 1 Chips and Tomatillo-Green Chili Salsa NaN $2.39
4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98
In [50]:
df = pd.read_csv('chipotle.tsv',sep='\t')
df.head()
Out[50]:
order_id quantity item_name choice_description item_price
0 1 1 Chips and Fresh Tomato Salsa NaN $2.39
1 1 1 Izze [Clementine] $3.39
2 1 1 Nantucket Nectar [Apple] $3.39
3 1 1 Chips and Tomatillo-Green Chili Salsa NaN $2.39
4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98

3- read table from url

Example-1

In [51]:
df = pd.read_html('https://en.wikipedia.org/wiki/Pakistan',attrs={'class': 'wikitable'})
In [52]:
len(df)
Out[52]:
1
In [53]:
df[0]
Out[53]:
Share of world GDP (PPP)[383]
Year Share
0 1980 0.54%
1 1990 0.72%
2 2000 0.74%
3 2010 0.79%
4 2017 0.83%

Example-2

In [54]:
df = pd.read_html('https://en.wikipedia.org/wiki/Pakistan',  attrs={"class":"sortable wikitable"})
In [55]:
len(df)
Out[55]:
1
In [56]:
df[0]
Out[56]:
Administrative division Capital Population
0 Balochistan Quetta 12344408
1 Punjab Lahore 110126285
2 Sindh Karachi 47886051
3 Khyber Pakhtunkhwa Peshawar 40525047
4 Gilgit-Baltistan Gilgit 1800000
5 Azad Kashmir Muzaffarabad 4567982
6 Islamabad Capital Territory Islamabad 2851868

Example-3

In [57]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_UFC_events',attrs={"id":"Scheduled_events"})
In [58]:
len(df)
Out[58]:
1
In [59]:
df[0].head()
Out[59]:
Event Date Venue Location Ref.
0 UFC Fight Night: Ortega vs. Korean Zombie Dec 21, 2019 Sajik Arena Busan, South Korea [9]
1 UFC 245 Dec 14, 2019 T-Mobile Arena Las Vegas, Nevada, U.S. [9]
2 UFC on ESPN: Overeem vs. Harris Dec 7, 2019 Capital One Arena Washington, D.C., U.S. [9][10]
3 UFC Fight Night: Błachowicz vs. Jacaré Nov 16, 2019 Ginásio do Ibirapuera São Paulo, Brazil [9]
4 UFC Fight Night: dos Santos vs. Volkov Nov 9, 2019 CSKA Arena Moscow, Russia [9]

Example-4

In [60]:
df = pd.read_html('https://en.wikipedia.org/wiki/University_of_California,_Berkeley', attrs={"class":"infobox vcard"})
In [61]:
len(df)
Out[61]:
1
In [62]:
df[0].head()
Out[62]:
0 1
0 NaN NaN
1 Former names University of California (1868–1958)
2 Motto Fiat lux (Latin)
3 Motto in English Let there be light
4 Type Public research universityFlagship

Example-5

In [63]:
df = pd.read_html('https://en.wikipedia.org/wiki/University_of_California,_Berkeley', attrs={"class":"infobox"})
In [64]:
len(df)
Out[64]:
1
In [65]:
df[0].head()
Out[65]:
University rankings
National National.1
0 ARWU[105] 4
1 Forbes[106] 13
2 Times/WSJ[107] 34
3 U.S. News & World Report[108] 22
4 Washington Monthly[109] 20

Example-6

In [66]:
df = pd.read_html('https://www.esportsearnings.com/players',attrs={"class":"detail_list_table"},header=0)
In [67]:
len(df)
Out[67]:
1
In [68]:
df[0].head()
Out[68]:
Unnamed: 0 Player ID Player Name Total (Overall) Highest Paying Game Total (Game) % of Total
0 1.0 N0tail Johan Sundstein $6,890,591.79 Dota 2 $6,882,440.18 99.88%
1 2.0 JerAx Jesse Vainikka $6,470,000.02 Dota 2 $6,470,000.02 100.00%
2 3.0 ana Anathan Pham $6,000,411.96 Dota 2 $6,000,411.96 100.00%
3 4.0 Ceb Sébastien Debs $5,489,233.01 Dota 2 $5,489,233.01 100.00%
4 5.0 Topson Topias Taavitsainen $5,414,446.17 Dota 2 $5,414,446.17 100.00%

4- Create Data Frame Using *.xlsx File

Example-1

In [69]:
df=pd.read_excel('WHO_ex.xlsx','WHO')
df.head()
Out[69]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2

Example-2

Import Multiple worksheets of the same workbook

Get all sheet names

In [70]:
df = pd.ExcelFile('Weather_subject.xlsx')
sheet_names = df.sheet_names
sheet_names
Out[70]:
['Weather', 'Subject']
In [71]:
Weather = pd.read_excel(df, 'Weather')
student = pd.read_excel(df, 'Subject')
In [72]:
Weather
Out[72]:
Day Event Temperature Windspeed
0 1/1/2019 Rain 30 6
1 1/2/2019 Sunny 32 7
2 1/3/2019 Rain 34 10
3 1/4/2019 Sunny 36 12
In [73]:
student
Out[73]:
Marks Name Result Subject
0 100 Umer Pass Math
1 98 Ali Pass Phy
2 97 Ahmed Pass Chem
3 95 Abdullah Pass Bio

Display all sheets

In [74]:
for tab in sheet_names:
    print('##################################  ' + tab + '   ##################################')
    dfall = pd.read_excel(df, tab)
    print(dfall)
##################################  Weather   ##################################
        Day  Event  Temperature  Windspeed
0  1/1/2019   Rain           30          6
1  1/2/2019  Sunny           32          7
2  1/3/2019   Rain           34         10
3  1/4/2019  Sunny           36         12
##################################  Subject   ##################################
   Marks      Name Result Subject
0    100      Umer   Pass    Math
1     98       Ali   Pass     Phy
2     97     Ahmed   Pass    Chem
3     95  Abdullah   Pass     Bio

5- Fixed-width formatted lines

In [75]:
df=pd.read_fwf('KPK_Weather.prn',colspecs = [(0,63),(63,76),(76,81),(81,92),(92,102)],parse_dates=["Day"])
df
Out[75]:
City Day Event Temperat Windspeed
0 Peshawar 2019-01-01 Rain 38 16
1 Abbottabad 2019-01-04 Sunny 46 20
2 Kohat 2019-01-03 Sunny 42 19
3 Dir 2019-01-02 Rain 40 17

6- Import Table From pdf file

Install Tabula

In [76]:
import sys
!{sys.executable} -m pip install tabula-py
Requirement already satisfied: tabula-py in c:\programdata\anaconda3\lib\site-packages (1.4.1)
Requirement already satisfied: distro in c:\programdata\anaconda3\lib\site-packages (from tabula-py) (1.4.0)
Requirement already satisfied: numpy in c:\programdata\anaconda3\lib\site-packages (from tabula-py) (1.16.4)
Requirement already satisfied: pandas in c:\programdata\anaconda3\lib\site-packages (from tabula-py) (0.25.0)
Requirement already satisfied: python-dateutil>=2.6.1 in c:\programdata\anaconda3\lib\site-packages (from pandas->tabula-py) (2.8.0)
Requirement already satisfied: pytz>=2017.2 in c:\programdata\anaconda3\lib\site-packages (from pandas->tabula-py) (2019.1)
Requirement already satisfied: six>=1.5 in c:\programdata\anaconda3\lib\site-packages (from python-dateutil>=2.6.1->pandas->tabula-py) (1.12.0)

import tabula

In [77]:
import tabula
from tabula import read_pdf
In [78]:
df = read_pdf("http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf", pages='all', multiple_tables=True)
In [79]:
len(df)
Out[79]:
13
In [80]:
df[4].head()
Out[80]:
0 1 2 3 4 5 6 7 8
0 NaN NaN NaN per 100 grams NaN NaN NaN NaN NaN
1 NaN Meats & Fish NaN NaN Portion size * NaN NaN energy content NaN
2 NaN NaN (3.5 oz) NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 Anchovies tinned 300 cals 300 cals Medium NaN NaN NaN NaN NaN

11.9 Export Data Frame From Pandas

In [81]:
from IPython.display import YouTubeVideo
YouTubeVideo('-0NwrcZOKh',width=900, height=500)
Out[81]:

We can Export Following Format Files in Pandas

In [82]:
student=pd.DataFrame({
'Name':['Umer','Ali','Ahmed','Abdullah'],
'Marks':[100,98,97,95],
'Result':['Pass','Pass','Pass','Pass'],
'Subject':['Math','Phy','Chem','Bio']
})
student
Out[82]:
Name Marks Result Subject
0 Umer 100 Pass Math
1 Ali 98 Pass Phy
2 Ahmed 97 Pass Chem
3 Abdullah 95 Pass Bio

Data Frame-2

In [83]:
weather=pd.DataFrame({
'day':['1/1/2019','1/2/2019','1/3/2019','1/4/2019'],
'temperature':[30,32,34,36],
'windspeed':[6,7,10,12],
'event':['Rain','Sunny','Rain','Sunny']
})
weather
Out[83]:
day temperature windspeed event
0 1/1/2019 30 6 Rain
1 1/2/2019 32 7 Sunny
2 1/3/2019 34 10 Rain
3 1/4/2019 36 12 Sunny

1- Write .csv Format File

In [84]:
student.to_csv('Output1.csv')

Export without Index

In [85]:
student.to_csv('output2.csv',index=False)

Export (without Index)/Selected Columns

In [86]:
student.to_csv('output3.csv',
        index=False,columns=['Name','Marks'])

Export File (Skip Header)

In [87]:
student.to_csv('output4.csv',index=False,header=False)

2- Write .xlsx Format File

In [88]:
student.to_excel("output5.xlsx",sheet_name="Student")

Export Without Index, specific row and col no

In [89]:
student.to_excel("output6.xlsx",sheet_name="Student",
                      startrow=3,startcol=3,index=False)

Export multiple worksheets in the same workbook

In [90]:
with pd.ExcelWriter('output7.xlsx') as writer:
    weather.to_excel(writer,sheet_name="Weather",index=False)
    student.to_excel(writer,sheet_name="Subject",index=False)

Read Write Data From Database

In [91]:
from IPython.display import YouTubeVideo
YouTubeVideo('M-4EpNdlSuY',width=900, height=500)
Out[91]:

11.10 Convert the DataFrame to a dictionary

to_dict

In [92]:
df= pd.read_csv('Family.csv',skipfooter=1,engine='python')
df
Out[92]:
Name ID Program
0 Umer Saeed F2017313014 MS(DS)
1 Ali Saeed F2017313016 BBA
2 Ahmed Saeed F2017313018 MS(CS)
In [93]:
df.to_dict('series')
Out[93]:
{'Name': 0     Umer Saeed
 1      Ali Saeed
 2    Ahmed Saeed
 Name: Name, dtype: object, 'ID': 0    F2017313014
 1    F2017313016
 2    F2017313018
 Name: ID, dtype: object, 'Program': 0    MS(DS)
 1       BBA
 2    MS(CS)
 Name: Program, dtype: object}
In [94]:
df.to_dict('split')
Out[94]:
{'index': [0, 1, 2],
 'columns': ['Name', 'ID', 'Program'],
 'data': [['Umer Saeed', 'F2017313014', 'MS(DS)'],
  ['Ali Saeed', 'F2017313016', 'BBA'],
  ['Ahmed Saeed', 'F2017313018', 'MS(CS)']]}
In [95]:
df.to_dict('index')
Out[95]:
{0: {'Name': 'Umer Saeed', 'ID': 'F2017313014', 'Program': 'MS(DS)'},
 1: {'Name': 'Ali Saeed', 'ID': 'F2017313016', 'Program': 'BBA'},
 2: {'Name': 'Ahmed Saeed', 'ID': 'F2017313018', 'Program': 'MS(CS)'}}
In [96]:
from collections import OrderedDict, defaultdict
df.to_dict(into=OrderedDict)
Out[96]:
OrderedDict([('Name',
              OrderedDict([(0, 'Umer Saeed'),
                           (1, 'Ali Saeed'),
                           (2, 'Ahmed Saeed')])),
             ('ID',
              OrderedDict([(0, 'F2017313014'),
                           (1, 'F2017313016'),
                           (2, 'F2017313018')])),
             ('Program',
              OrderedDict([(0, 'MS(DS)'), (1, 'BBA'), (2, 'MS(CS)')]))])
In [97]:
df.to_dict('records', into=defaultdict(list))
Out[97]:
[defaultdict(list,
             {'Name': 'Umer Saeed', 'ID': 'F2017313014', 'Program': 'MS(DS)'}),
 defaultdict(list,
             {'Name': 'Ali Saeed', 'ID': 'F2017313016', 'Program': 'BBA'}),
 defaultdict(list,
             {'Name': 'Ahmed Saeed',
              'ID': 'F2017313018',
              'Program': 'MS(CS)'})]

blocks

  • (Deprecated since version 0.21.0.) DataFramem blocks attribute has successfully returned a dictionary containing the data of the dataframe.
  • Homogeneous columns are places in the same block.
In [98]:
df= pd.read_csv('Family.csv',skipfooter=1,engine='python')
df
Out[98]:
Name ID Program
0 Umer Saeed F2017313014 MS(DS)
1 Ali Saeed F2017313016 BBA
2 Ahmed Saeed F2017313018 MS(CS)
In [99]:
df.blocks
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py:5729: FutureWarning: as_blocks is deprecated and will be removed in a future version
  return self.as_blocks()
Out[99]:
{'object':           Name           ID Program
 0   Umer Saeed  F2017313014  MS(DS)
 1    Ali Saeed  F2017313016     BBA
 2  Ahmed Saeed  F2017313018  MS(CS)}

11.11 Convert the DataFrame to a array

to_records()

  • Convert DataFrame to a NumPy record array
In [100]:
df= pd.read_csv('Family.csv',skipfooter=1,engine='python')
df
Out[100]:
Name ID Program
0 Umer Saeed F2017313014 MS(DS)
1 Ali Saeed F2017313016 BBA
2 Ahmed Saeed F2017313018 MS(CS)
In [101]:
df.to_records()
Out[101]:
rec.array([(0, 'Umer Saeed', 'F2017313014', 'MS(DS)'),
           (1, 'Ali Saeed', 'F2017313016', 'BBA'),
           (2, 'Ahmed Saeed', 'F2017313018', 'MS(CS)')],
          dtype=[('index', '<i8'), ('Name', 'O'), ('ID', 'O'), ('Program', 'O')])
In [102]:
df.to_records(index=False)
Out[102]:
rec.array([('Umer Saeed', 'F2017313014', 'MS(DS)'),
           ('Ali Saeed', 'F2017313016', 'BBA'),
           ('Ahmed Saeed', 'F2017313018', 'MS(CS)')],
          dtype=[('Name', 'O'), ('ID', 'O'), ('Program', 'O')])

values

  • More Information
  • values Return a Numpy representation of the DataFrame.
  • Only the values in the DataFrame will be returned, the axes labels will be removed.
  • Warning: Pandas recommend using DataFrame.to_numpy() instead
In [103]:
import pandas as pd
df= pd.read_csv('Family.csv',skipfooter=1,engine='python')
df
Out[103]:
Name ID Program
0 Umer Saeed F2017313014 MS(DS)
1 Ali Saeed F2017313016 BBA
2 Ahmed Saeed F2017313018 MS(CS)
In [104]:
df.values
Out[104]:
array([['Umer Saeed', 'F2017313014', 'MS(DS)'],
       ['Ali Saeed', 'F2017313016', 'BBA'],
       ['Ahmed Saeed', 'F2017313018', 'MS(CS)']], dtype=object)

to_numpy

In [105]:
df.to_numpy()
Out[105]:
array([['Umer Saeed', 'F2017313014', 'MS(DS)'],
       ['Ali Saeed', 'F2017313016', 'BBA'],
       ['Ahmed Saeed', 'F2017313018', 'MS(CS)']], dtype=object)

array

In [106]:
df['Name'].array
Out[106]:
<PandasArray>
['Umer Saeed', 'Ali Saeed', 'Ahmed Saeed']
Length: 3, dtype: object

to_list

  • Return a list of the values.
In [107]:
df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
df
Out[107]:
a b
0 1 4
1 2 5
2 3 6
In [108]:
df['a'].to_list()
Out[108]:
[1, 2, 3]

explode

  • Transform each element of a list-like to a row, replicating the index values.

Example-1

In [109]:
df = pd.Series([[1, 2, 3], 'foo', [], [3, 4]])
In [110]:
df
Out[110]:
0    [1, 2, 3]
1          foo
2           []
3       [3, 4]
dtype: object
In [111]:
df.explode()
Out[111]:
0      1
0      2
0      3
1    foo
2    NaN
3      3
3      4
dtype: object

Example-2

In [112]:
df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})
In [113]:
df
Out[113]:
A B
0 1 [1, 2]
1 2 [1, 2]
In [114]:
df.explode('B')
Out[114]:
A B
0 1 1
0 1 2
1 2 1
1 2 2

get_values

  • More Information
  • (DEPRECATED) Return an ndarray after converting sparse values to dense.
In [115]:
df= pd.read_csv('Family.csv',skipfooter=1,engine='python')
df
Out[115]:
Name ID Program
0 Umer Saeed F2017313014 MS(DS)
1 Ali Saeed F2017313016 BBA
2 Ahmed Saeed F2017313018 MS(CS)
In [116]:
df.get_values()
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: The 'get_values' method is deprecated and will be removed in a future version. Use '.values' or 'np.asarray(..)' instead.
  """Entry point for launching an IPython kernel.
Out[116]:
array([['Umer Saeed', 'F2017313014', 'MS(DS)'],
       ['Ali Saeed', 'F2017313016', 'BBA'],
       ['Ahmed Saeed', 'F2017313018', 'MS(CS)']], dtype=object)

11.12 General Operations on Pandas Data Frames

  • More Informaion
  • This function returns the first n rows for the object based on position.
  • It is useful for quickly testing if your object has the right type of data in it.
In [117]:
df_who= pd.read_csv('WHO_csv.csv')
In [118]:
df_who.head()
Out[118]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
In [119]:
df_who.head(2)
Out[119]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN

Tail of the Data Frame

In [120]:
df_who.tail()
Out[120]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
189 Venezuela (Bolivarian Republic of) Americas 29955 28.84 9.17 2.44 75 15.3 97.78 NaN 12430.0 94.7 95.1
190 Viet Nam Western Pacific 90796 22.87 9.32 1.79 75 23.0 143.39 93.2 3250.0 NaN NaN
191 Yemen Eastern Mediterranean 23852 40.72 4.54 4.35 64 60.0 47.05 63.9 2170.0 85.5 70.5
192 Zambia Africa 14075 46.73 3.95 5.77 55 88.5 60.59 71.2 1490.0 91.4 93.9
193 Zimbabwe Africa 13724 40.24 5.68 3.64 54 89.8 72.13 92.2 NaN NaN NaN
In [121]:
df_who.tail(2)
Out[121]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
192 Zambia Africa 14075 46.73 3.95 5.77 55 88.5 60.59 71.2 1490.0 91.4 93.9
193 Zimbabwe Africa 13724 40.24 5.68 3.64 54 89.8 72.13 92.2 NaN NaN NaN

Sample of the Data Frame

In [122]:
df_who.sample(3)
Out[122]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
73 Haiti Americas 10174 35.35 6.70 3.28 63 75.6 41.49 NaN 1180.0 NaN NaN
165 Swaziland Africa 1231 38.05 5.34 3.48 50 79.7 63.70 87.4 5930.0 NaN NaN
64 Georgia Europe 4358 17.62 19.47 1.82 72 19.9 102.31 99.7 5350.0 NaN NaN

Shape of the Data Frame

  • More Information
  • Return a tuple representing the dimensionality of the DataFrame

Method-1

In [123]:
df_who.shape
Out[123]:
(194, 13)

Method-2

In [124]:
rows,columns= df_who.shape
print(rows,columns)
194 13
In [125]:
print("No of rows in the WHO Data Set:")
print(rows)
print("No of columns in the WHO Data Set:")
print(columns)
No of rows in the WHO Data Set:
194
No of columns in the WHO Data Set:
13

Dimensions of the Data Frame

  • More Information
  • Return an int representing the number of axes / array dimensions.
  • Return 1 if Series. Otherwise return 2 if DataFrame.

Example-1

In [126]:
df_who.ndim
Out[126]:
2

Example-2

In [127]:
s = pd.Series({'a': 1, 'b': 2, 'c': 3})
s
Out[127]:
a    1
b    2
c    3
dtype: int64
In [128]:
s.ndim
Out[128]:
1

Size of the Data Frame

Example-1

In [129]:
df_who.size
Out[129]:
2522

Example-2

In [130]:
s = pd.Series({'a': 1, 'b': 2, 'c': 3})
s
Out[130]:
a    1
b    2
c    3
dtype: int64
In [131]:
s.size
Out[131]:
3

Get Variables Name of the Data Frame

Method-1

  • The column labels of the DataFrame.
In [132]:
df_who.columns
Out[132]:
Index(['Country', 'Region', 'Population', 'Under15', 'Over60', 'FertilityRate',
       'LifeExpectancy', 'ChildMortality', 'CellularSubscribers',
       'LiteracyRate', 'GNI', 'PrimarySchoolEnrollmentMale',
       'PrimarySchoolEnrollmentFemale'],
      dtype='object')

Method-2

In [133]:
df_who.keys()
Out[133]:
Index(['Country', 'Region', 'Population', 'Under15', 'Over60', 'FertilityRate',
       'LifeExpectancy', 'ChildMortality', 'CellularSubscribers',
       'LiteracyRate', 'GNI', 'PrimarySchoolEnrollmentMale',
       'PrimarySchoolEnrollmentFemale'],
      dtype='object')

Method-3

In [134]:
df_who['Country'].name
Out[134]:
'Country'

Index of the Data Frame

  • The index (row labels) of the DataFrame.
In [135]:
df_who.index
Out[135]:
RangeIndex(start=0, stop=194, step=1)

axes of the Data Frame

  • More Information
  • Return a list representing the axes of the DataFrame.
  • It has the row axis labels and column axis labels as the only members.
  • They are returned in that order.
In [136]:
df_who.axes
Out[136]:
[RangeIndex(start=0, stop=194, step=1),
 Index(['Country', 'Region', 'Population', 'Under15', 'Over60', 'FertilityRate',
        'LifeExpectancy', 'ChildMortality', 'CellularSubscribers',
        'LiteracyRate', 'GNI', 'PrimarySchoolEnrollmentMale',
        'PrimarySchoolEnrollmentFemale'],
       dtype='object')]

Set Index to the specfic Column

  • More Information
  • Set the DataFrame index (row labels) using one or more existing columns

Example-1

In [137]:
from IPython.display import YouTubeVideo
YouTubeVideo('XaCSdr7pPmY',width=900, height=500)
Out[137]:
In [138]:
df_who.set_index('Region',inplace=True)
df_who
Out[138]:
Country Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
Region
Eastern Mediterranean Afghanistan 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
Europe Albania 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
Africa Algeria 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
Europe Andorra 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
Africa Angola 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
Americas Antigua and Barbuda 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
Americas Argentina 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN
Europe Armenia 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN
Western Pacific Australia 23050 18.95 19.46 1.89 82 4.9 108.34 NaN 38110.0 96.9 97.5
Europe Austria 8464 14.51 23.52 1.44 81 4.0 154.78 NaN 42050.0 NaN NaN
Europe Azerbaijan 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
Americas Bahamas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
Eastern Mediterranean Bahrain 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
South-East Asia Bangladesh 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
Americas Barbados 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
Europe Belarus 9405 15.10 19.31 1.47 71 5.2 111.88 NaN 14460.0 NaN NaN
Europe Belgium 11060 16.88 23.81 1.85 80 4.2 116.61 NaN 39190.0 98.9 99.2
Americas Belize 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
Africa Benin 10051 42.95 4.54 5.01 57 89.5 85.33 42.4 1620.0 NaN NaN
South-East Asia Bhutan 742 28.53 6.90 2.32 67 44.6 65.58 NaN 5570.0 88.3 91.5
Americas Bolivia (Plurinational State of) 10496 35.23 7.28 3.31 67 41.4 82.82 NaN 4890.0 91.2 91.5
Europe Bosnia and Herzegovina 3834 16.35 20.52 1.26 76 6.7 84.52 97.9 9190.0 86.5 88.4
Africa Botswana 2004 33.75 5.63 2.71 66 53.3 142.82 84.5 14550.0 NaN NaN
Americas Brazil 199000 24.56 10.81 1.82 74 14.4 124.26 NaN 11420.0 NaN NaN
Western Pacific Brunei Darussalam 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
Europe Bulgaria 7278 13.53 26.11 1.51 74 12.1 140.68 NaN 14160.0 99.3 99.7
Africa Burkina Faso 16460 45.66 3.88 5.78 56 102.4 45.27 NaN 1300.0 60.7 55.9
Africa Burundi 9850 44.20 3.87 6.21 53 104.3 22.33 67.2 610.0 NaN NaN
Western Pacific Cambodia 14865 31.23 7.67 2.93 65 39.7 96.17 NaN 2230.0 96.4 95.4
Africa Cameroon 21700 43.08 4.89 4.94 53 94.9 52.35 NaN 2330.0 99.6 87.4
... ... ... ... ... ... ... ... ... ... ... ... ...
Americas Suriname 535 27.83 9.55 2.32 72 20.8 178.88 94.7 NaN NaN NaN
Africa Swaziland 1231 38.05 5.34 3.48 50 79.7 63.70 87.4 5930.0 NaN NaN
Europe Sweden 9511 16.71 25.32 1.93 82 2.9 118.57 NaN 42200.0 99.7 99.0
Europe Switzerland 7997 14.79 23.25 1.51 83 4.3 131.43 NaN 52570.0 98.9 99.5
Eastern Mediterranean Syrian Arab Republic 21890 35.35 6.09 3.04 75 15.1 63.17 83.4 NaN NaN NaN
Europe Tajikistan 8009 35.75 4.80 3.81 68 58.3 90.64 99.7 2300.0 99.5 96.0
South-East Asia Thailand 66785 18.47 13.96 1.43 74 13.2 111.63 NaN 8360.0 NaN NaN
Europe The former Yugoslav Republic of Macedonia 2106 16.89 17.56 1.44 75 7.4 107.24 97.3 11090.0 97.3 99.2
South-East Asia Timor-Leste 1114 46.33 5.16 6.11 64 56.7 53.23 58.3 NaN 86.2 85.6
Africa Togo 6643 41.89 4.44 4.75 56 95.5 50.45 NaN 1040.0 NaN NaN
Western Pacific Tonga 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
Americas Trinidad and Tobago 1337 20.73 13.18 1.80 71 20.7 135.64 98.8 NaN 97.7 97.0
Eastern Mediterranean Tunisia 10875 23.22 10.49 2.04 76 16.1 116.93 NaN 9030.0 NaN NaN
Europe Turkey 73997 26.00 10.56 2.08 76 14.2 88.70 NaN 16940.0 99.5 98.3
Europe Turkmenistan 5173 28.65 6.30 2.38 63 52.8 68.77 99.6 8690.0 NaN NaN
Western Pacific Tuvalu 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
Africa Uganda 36346 48.54 3.72 6.06 56 68.9 48.38 73.2 1310.0 89.7 92.3
Europe Ukraine 45530 14.18 20.76 1.45 71 10.7 122.98 99.7 7040.0 90.8 91.5
Eastern Mediterranean United Arab Emirates 9206 14.41 0.81 1.84 76 8.4 148.62 NaN 47890.0 NaN NaN
Europe United Kingdom 62783 17.54 23.06 1.90 80 4.8 130.75 NaN 36010.0 99.8 99.6
Africa United Republic of Tanzania 47783 44.85 4.89 5.36 59 54.0 55.53 73.2 1500.0 NaN NaN
Americas United States of America 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1
Americas Uruguay 3395 22.05 18.59 2.07 77 7.2 140.75 98.1 14640.0 NaN NaN
Europe Uzbekistan 28541 28.90 6.38 2.38 68 39.6 91.65 99.4 3420.0 93.3 91.0
Western Pacific Vanuatu 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN
Americas Venezuela (Bolivarian Republic of) 29955 28.84 9.17 2.44 75 15.3 97.78 NaN 12430.0 94.7 95.1
Western Pacific Viet Nam 90796 22.87 9.32 1.79 75 23.0 143.39 93.2 3250.0 NaN NaN
Eastern Mediterranean Yemen 23852 40.72 4.54 4.35 64 60.0 47.05 63.9 2170.0 85.5 70.5
Africa Zambia 14075 46.73 3.95 5.77 55 88.5 60.59 71.2 1490.0 91.4 93.9
Africa Zimbabwe 13724 40.24 5.68 3.64 54 89.8 72.13 92.2 NaN NaN NaN

194 rows × 12 columns

Example-2

In [139]:
df_who= pd.read_csv('WHO_csv.csv')
df_who.set_index(['Region','Country'],inplace=True)
df_who
Out[139]:
Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
Region Country
Eastern Mediterranean Afghanistan 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
Europe Albania 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
Africa Algeria 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
Europe Andorra 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
Africa Angola 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
Americas Antigua and Barbuda 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
Argentina 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN
Europe Armenia 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN
Western Pacific Australia 23050 18.95 19.46 1.89 82 4.9 108.34 NaN 38110.0 96.9 97.5
Europe Austria 8464 14.51 23.52 1.44 81 4.0 154.78 NaN 42050.0 NaN NaN
Azerbaijan 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
Americas Bahamas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
Eastern Mediterranean Bahrain 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
South-East Asia Bangladesh 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
Americas Barbados 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
Europe Belarus 9405 15.10 19.31 1.47 71 5.2 111.88 NaN 14460.0 NaN NaN
Belgium 11060 16.88 23.81 1.85 80 4.2 116.61 NaN 39190.0 98.9 99.2
Americas Belize 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
Africa Benin 10051 42.95 4.54 5.01 57 89.5 85.33 42.4 1620.0 NaN NaN
South-East Asia Bhutan 742 28.53 6.90 2.32 67 44.6 65.58 NaN 5570.0 88.3 91.5
Americas Bolivia (Plurinational State of) 10496 35.23 7.28 3.31 67 41.4 82.82 NaN 4890.0 91.2 91.5
Europe Bosnia and Herzegovina 3834 16.35 20.52 1.26 76 6.7 84.52 97.9 9190.0 86.5 88.4
Africa Botswana 2004 33.75 5.63 2.71 66 53.3 142.82 84.5 14550.0 NaN NaN
Americas Brazil 199000 24.56 10.81 1.82 74 14.4 124.26 NaN 11420.0 NaN NaN
Western Pacific Brunei Darussalam 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
Europe Bulgaria 7278 13.53 26.11 1.51 74 12.1 140.68 NaN 14160.0 99.3 99.7
Africa Burkina Faso 16460 45.66 3.88 5.78 56 102.4 45.27 NaN 1300.0 60.7 55.9
Burundi 9850 44.20 3.87 6.21 53 104.3 22.33 67.2 610.0 NaN NaN
Western Pacific Cambodia 14865 31.23 7.67 2.93 65 39.7 96.17 NaN 2230.0 96.4 95.4
Africa Cameroon 21700 43.08 4.89 4.94 53 94.9 52.35 NaN 2330.0 99.6 87.4
... ... ... ... ... ... ... ... ... ... ... ... ...
Americas Suriname 535 27.83 9.55 2.32 72 20.8 178.88 94.7 NaN NaN NaN
Africa Swaziland 1231 38.05 5.34 3.48 50 79.7 63.70 87.4 5930.0 NaN NaN
Europe Sweden 9511 16.71 25.32 1.93 82 2.9 118.57 NaN 42200.0 99.7 99.0
Switzerland 7997 14.79 23.25 1.51 83 4.3 131.43 NaN 52570.0 98.9 99.5
Eastern Mediterranean Syrian Arab Republic 21890 35.35 6.09 3.04 75 15.1 63.17 83.4 NaN NaN NaN
Europe Tajikistan 8009 35.75 4.80 3.81 68 58.3 90.64 99.7 2300.0 99.5 96.0
South-East Asia Thailand 66785 18.47 13.96 1.43 74 13.2 111.63 NaN 8360.0 NaN NaN
Europe The former Yugoslav Republic of Macedonia 2106 16.89 17.56 1.44 75 7.4 107.24 97.3 11090.0 97.3 99.2
South-East Asia Timor-Leste 1114 46.33 5.16 6.11 64 56.7 53.23 58.3 NaN 86.2 85.6
Africa Togo 6643 41.89 4.44 4.75 56 95.5 50.45 NaN 1040.0 NaN NaN
Western Pacific Tonga 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
Americas Trinidad and Tobago 1337 20.73 13.18 1.80 71 20.7 135.64 98.8 NaN 97.7 97.0
Eastern Mediterranean Tunisia 10875 23.22 10.49 2.04 76 16.1 116.93 NaN 9030.0 NaN NaN
Europe Turkey 73997 26.00 10.56 2.08 76 14.2 88.70 NaN 16940.0 99.5 98.3
Turkmenistan 5173 28.65 6.30 2.38 63 52.8 68.77 99.6 8690.0 NaN NaN
Western Pacific Tuvalu 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
Africa Uganda 36346 48.54 3.72 6.06 56 68.9 48.38 73.2 1310.0 89.7 92.3
Europe Ukraine 45530 14.18 20.76 1.45 71 10.7 122.98 99.7 7040.0 90.8 91.5
Eastern Mediterranean United Arab Emirates 9206 14.41 0.81 1.84 76 8.4 148.62 NaN 47890.0 NaN NaN
Europe United Kingdom 62783 17.54 23.06 1.90 80 4.8 130.75 NaN 36010.0 99.8 99.6
Africa United Republic of Tanzania 47783 44.85 4.89 5.36 59 54.0 55.53 73.2 1500.0 NaN NaN
Americas United States of America 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1
Uruguay 3395 22.05 18.59 2.07 77 7.2 140.75 98.1 14640.0 NaN NaN
Europe Uzbekistan 28541 28.90 6.38 2.38 68 39.6 91.65 99.4 3420.0 93.3 91.0
Western Pacific Vanuatu 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN
Americas Venezuela (Bolivarian Republic of) 29955 28.84 9.17 2.44 75 15.3 97.78 NaN 12430.0 94.7 95.1
Western Pacific Viet Nam 90796 22.87 9.32 1.79 75 23.0 143.39 93.2 3250.0 NaN NaN
Eastern Mediterranean Yemen 23852 40.72 4.54 4.35 64 60.0 47.05 63.9 2170.0 85.5 70.5
Africa Zambia 14075 46.73 3.95 5.77 55 88.5 60.59 71.2 1490.0 91.4 93.9
Zimbabwe 13724 40.24 5.68 3.64 54 89.8 72.13 92.2 NaN NaN NaN

194 rows × 11 columns

Re-Set Index

In [140]:
df_who.reset_index(['Region','Country'],inplace=True)
In [141]:
df_who.head()
Out[141]:
Region Country Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Eastern Mediterranean Afghanistan 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Europe Albania 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Africa Algeria 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Europe Andorra 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Africa Angola 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2

set_axis

  • Assign desired index to given axis.
  • Indexes for column or row labels can be changed by assigning a list-like or Index
  • More Information
In [142]:
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df
Out[142]:
A B
0 1 4
1 2 5
2 3 6
In [143]:
df.set_axis(['a', 'b', 'c'], axis='rows', inplace=True)
df
Out[143]:
A B
a 1 4
b 2 5
c 3 6
In [144]:
df.set_axis(['A1','B1'], axis='columns', inplace=True)
df
Out[144]:
A1 B1
a 1 4
b 2 5
c 3 6

Why do some pandas commands end with parentheses (and others don't)?

In [145]:
from IPython.display import YouTubeVideo
YouTubeVideo('hSrDViyKWVk',width=900, height=500)
Out[145]:

11.13 Data Type

In [146]:
df_who= pd.read_csv('WHO_csv.csv')
In [147]:
df_who.dtypes
Out[147]:
Country                           object
Region                            object
Population                         int64
Under15                          float64
Over60                           float64
FertilityRate                    float64
LifeExpectancy                     int64
ChildMortality                   float64
CellularSubscribers              float64
LiteracyRate                     float64
GNI                              float64
PrimarySchoolEnrollmentMale      float64
PrimarySchoolEnrollmentFemale    float64
dtype: object
In [148]:
df_who['Country'].dtype
Out[148]:
dtype('O')
In [149]:
df_who['Country'].dtypes
Out[149]:
dtype('O')

get_dtype_counts()

In [150]:
df_who.get_dtype_counts()
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: `get_dtype_counts` has been deprecated and will be removed in a future version. For DataFrames use `.dtypes.value_counts()
  """Entry point for launching an IPython kernel.
Out[150]:
float64    9
int64      2
object     2
dtype: int64

infer_objects

  • More Information
  • Attempt to infer better dtypes for object columns.
  • Attempts soft conversion of object-dtyped columns, leaving non-object and unconvertible columns unchanged.
  • The inference rules are the same as during normal Series/DataFrame construction.
In [151]:
df = pd.DataFrame({"A": ["a", 1, 2, 3]})
df
Out[151]:
A
0 a
1 1
2 2
3 3
In [152]:
df = df.iloc[1:]
df
Out[152]:
A
1 1
2 2
3 3
In [153]:
df.dtypes
Out[153]:
A    object
dtype: object
In [154]:
df.infer_objects().dtypes
Out[154]:
A    int64
dtype: object

Convert strings to numbers

In [155]:
from IPython.display import YouTubeVideo
YouTubeVideo('V0AWyzVMf54',width=900, height=500)
Out[155]:
In [156]:
df = pd.DataFrame({'col_one':['1.1', '2.2', '3.3'],
                   'col_two':['4.4', '5.5', '6.6'],
                   'col_three':['7.7', '8.8', '-']})
df
Out[156]:
col_one col_two col_three
0 1.1 4.4 7.7
1 2.2 5.5 8.8
2 3.3 6.6 -
  • These numbers are actually stored as strings, which results in object columns:
In [157]:
df.dtypes
Out[157]:
col_one      object
col_two      object
col_three    object
dtype: object
  • In order to do mathematical operations on these columns, we need to convert the data types to numeric. You can use the astype() method on the first two columns:
  • More Information
In [158]:
df.astype({'col_one':'float', 'col_two':'float'}).dtypes
Out[158]:
col_one      float64
col_two      float64
col_three     object
dtype: object
  • However, this would have resulted in an error if you tried to use it on the third column, because that column contains a dash to represent zero and pandas doesn't understand how to handle it.
  • Instead, you can use the to_numeric() function on the third column and tell it to convert any invalid input into NaN values:
In [159]:
pd.to_numeric(df.col_three, errors='coerce')
Out[159]:
0    7.7
1    8.8
2    NaN
Name: col_three, dtype: float64
  • If you know that the NaN values actually represent zeros, you can fill them with zeros using the fillna() method:
In [160]:
pd.to_numeric(df.col_three, errors='coerce').fillna(0)
Out[160]:
0    7.7
1    8.8
2    0.0
Name: col_three, dtype: float64
  • Finally, you can apply this function to the entire DataFrame all at once by using the apply() method:
In [161]:
df1=df.apply(pd.to_numeric, errors='coerce').fillna(0)
df1
Out[161]:
col_one col_two col_three
0 1.1 4.4 7.7
1 2.2 5.5 8.8
2 3.3 6.6 0.0
In [162]:
df1.dtypes
Out[162]:
col_one      float64
col_two      float64
col_three    float64
dtype: object
  • This one line of code accomplishes our goal, because all of the data types have now been converted to float:

Example-1

In [163]:
import pandas as pd
df= pd.read_csv('Weather1.csv')
df
Out[163]:
Date Temperature Windspeed Event
0 1/1/2019 30 6 Rain
1 1/2/2019 32 7 Sunny
2 1/3/2019 34 10 Rain
3 1/4/2019 36 12 Sunny
In [164]:
df.dtypes
Out[164]:
Date           object
Temperature     int64
Windspeed       int64
Event          object
dtype: object
In [165]:
df= pd.read_csv('Weather1.csv',dtype = {"Temperature" : "float64","Windspeed" : "float64"})
df
Out[165]:
Date Temperature Windspeed Event
0 1/1/2019 30.0 6.0 Rain
1 1/2/2019 32.0 7.0 Sunny
2 1/3/2019 34.0 10.0 Rain
3 1/4/2019 36.0 12.0 Sunny
In [166]:
df.dtypes
Out[166]:
Date            object
Temperature    float64
Windspeed      float64
Event           object
dtype: object

Example-2

In [167]:
df= pd.read_csv('Weather1.csv',parse_dates=['Date'],dtype = {"Temperature" : "float64","Windspeed" : "float64"})
df
Out[167]:
Date Temperature Windspeed Event
0 2019-01-01 30.0 6.0 Rain
1 2019-01-02 32.0 7.0 Sunny
2 2019-01-03 34.0 10.0 Rain
3 2019-01-04 36.0 12.0 Sunny
In [168]:
df.dtypes
Out[168]:
Date           datetime64[ns]
Temperature           float64
Windspeed             float64
Event                  object
dtype: object
In [169]:
from IPython.display import YouTubeVideo
YouTubeVideo('P_q0tkYqvSk',width=900, height=500)
Out[169]:

ftypes

  • More Information
  • (DEPRECATED) Return the ftypes (indication of sparse/dense and dtype) in DataFrame.
  • This returns a Series with the data type of each column.
  • The result’s index is the original DataFrame’s columns.
  • Columns with mixed types are stored with the object dtype
In [170]:
df_who.ftypes
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: DataFrame.ftypes is deprecated and will be removed in a future version. Use DataFrame.dtypes instead.
  """Entry point for launching an IPython kernel.
Out[170]:
Country                           object:dense
Region                            object:dense
Population                         int64:dense
Under15                          float64:dense
Over60                           float64:dense
FertilityRate                    float64:dense
LifeExpectancy                     int64:dense
ChildMortality                   float64:dense
CellularSubscribers              float64:dense
LiteracyRate                     float64:dense
GNI                              float64:dense
PrimarySchoolEnrollmentMale      float64:dense
PrimarySchoolEnrollmentFemale    float64:dense
dtype: object

get_ftype_counts()

In [171]:
df_who.get_ftype_counts()
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: get_ftype_counts is deprecated and will be removed in a future version
  """Entry point for launching an IPython kernel.
Out[171]:
float64:dense    9
int64:dense      2
object:dense     2
dtype: int64

Select columns by data type

  • More Information
  • Return a subset of the DataFrame’s columns based on the column dtypes.
  • Let's say you need to select only the numeric columns. You can use the select_dtypes() method:
In [172]:
df_who.select_dtypes(include='number').head()
Out[172]:
Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
  • This includes both int and float columns.
In [173]:
df_who.select_dtypes(include='float64').head()
Out[173]:
Under15 Over60 FertilityRate ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 47.42 3.82 5.40 98.5 54.26 NaN 1140.0 NaN NaN
1 21.33 14.93 1.75 16.7 96.39 NaN 8820.0 NaN NaN
2 27.42 7.17 2.83 20.0 98.99 NaN 8310.0 98.2 96.4
3 15.20 22.86 NaN 3.2 75.49 NaN NaN 78.4 79.4
4 47.58 3.84 6.10 163.5 48.38 70.1 5230.0 93.1 78.2
In [174]:
df_who.select_dtypes(include='int64').head()
Out[174]:
Population LifeExpectancy
0 29825 60
1 3162 74
2 38482 73
3 78 82
4 20821 51
In [175]:
df_who.select_dtypes(include='object').head()
Out[175]:
Country Region
0 Afghanistan Eastern Mediterranean
1 Albania Europe
2 Algeria Africa
3 Andorra Europe
4 Angola Africa
  • You can tell it to include multiple data types by passing a list:
In [176]:
df_who.select_dtypes(include=['int64', 'object']).head()
Out[176]:
Country Region Population LifeExpectancy
0 Afghanistan Eastern Mediterranean 29825 60
1 Albania Europe 3162 74
2 Algeria Africa 38482 73
3 Andorra Europe 78 82
4 Angola Africa 20821 51
  • You can also tell it to exclude certain data types:
In [177]:
df_who.select_dtypes(exclude='number').head()
Out[177]:
Country Region
0 Afghanistan Eastern Mediterranean
1 Albania Europe
2 Algeria Africa
3 Andorra Europe
4 Angola Africa

Date Time Data Type

In [178]:
df= pd.read_csv("DateIssue.csv")
df
Out[178]:
Date Temperature Windspeed Event
0 1/1/2019 30 6 Rain
1 1/2/2019 32 7 Sunny
2 1/3/2019 34 10 Rain
3 1/4/2019 36 12 Sunny
In [179]:
type(df.Date[0])
Out[179]:
str

Solution

In [180]:
df= pd.read_csv("DateIssue.csv",parse_dates=['Date'])
df
Out[180]:
Date Temperature Windspeed Event
0 2019-01-01 30 6 Rain
1 2019-01-02 32 7 Sunny
2 2019-01-03 34 10 Rain
3 2019-01-04 36 12 Sunny
In [181]:
type(df.Date[0])
Out[181]:
pandas._libs.tslibs.timestamps.Timestamp

memory_usage

In [182]:
from IPython.display import YouTubeVideo
YouTubeVideo('wDYDYGyN_cw',width=900, height=500)
Out[182]:
In [183]:
df_who.memory_usage()
Out[183]:
Index                             128
Country                          1552
Region                           1552
Population                       1552
Under15                          1552
Over60                           1552
FertilityRate                    1552
LifeExpectancy                   1552
ChildMortality                   1552
CellularSubscribers              1552
LiteracyRate                     1552
GNI                              1552
PrimarySchoolEnrollmentMale      1552
PrimarySchoolEnrollmentFemale    1552
dtype: int64
In [184]:
df_who.memory_usage(index=False)
Out[184]:
Country                          1552
Region                           1552
Population                       1552
Under15                          1552
Over60                           1552
FertilityRate                    1552
LifeExpectancy                   1552
ChildMortality                   1552
CellularSubscribers              1552
LiteracyRate                     1552
GNI                              1552
PrimarySchoolEnrollmentMale      1552
PrimarySchoolEnrollmentFemale    1552
dtype: int64
  • Use a Categorical for efficient storage of an object-dtype column with many repeated values.
In [185]:
df_who['Region'].astype('category').memory_usage(deep=True)
Out[185]:
895
In [186]:
df_who['Region'].astype('category').memory_usage(deep=False)
Out[186]:
530

nbytes

In [187]:
df_who['Country'].nbytes
Out[187]:
1552

11.14 Summary of the Missing Values

In [188]:
from IPython.display import YouTubeVideo
YouTubeVideo('fCMrO_VzeL8',width=900, height=500)
Out[188]:
  • More Information
  • Print a concise summary of a DataFrame.
  • This method prints information about a DataFrame including the index dtype and column dtypes, non-null values and memory usage.
In [189]:
df_who= pd.read_csv('WHO_csv.csv')
In [190]:
df_who.info()
# df_who.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 194 entries, 0 to 193
Data columns (total 13 columns):
Country                          194 non-null object
Region                           194 non-null object
Population                       194 non-null int64
Under15                          194 non-null float64
Over60                           194 non-null float64
FertilityRate                    183 non-null float64
LifeExpectancy                   194 non-null int64
ChildMortality                   194 non-null float64
CellularSubscribers              184 non-null float64
LiteracyRate                     103 non-null float64
GNI                              162 non-null float64
PrimarySchoolEnrollmentMale      101 non-null float64
PrimarySchoolEnrollmentFemale    101 non-null float64
dtypes: float64(9), int64(2), object(2)
memory usage: 19.8+ KB
In [191]:
df_who.info(verbose=False)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 194 entries, 0 to 193
Columns: 13 entries, Country to PrimarySchoolEnrollmentFemale
dtypes: float64(9), int64(2), object(2)
memory usage: 19.8+ KB

isnull

In [192]:
df_who.isnull().head()
Out[192]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 False False False False False False False False False True False True True
1 False False False False False False False False False True False True True
2 False False False False False False False False False True False False False
3 False False False False False True False False False True True False False
4 False False False False False False False False False False False False False
In [193]:
df_who.isnull().sum()
Out[193]:
Country                           0
Region                            0
Population                        0
Under15                           0
Over60                            0
FertilityRate                    11
LifeExpectancy                    0
ChildMortality                    0
CellularSubscribers              10
LiteracyRate                     91
GNI                              32
PrimarySchoolEnrollmentMale      93
PrimarySchoolEnrollmentFemale    93
dtype: int64

isna

In [194]:
df_who.isna().head()
Out[194]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 False False False False False False False False False True False True True
1 False False False False False False False False False True False True True
2 False False False False False False False False False True False False False
3 False False False False False True False False False True True False False
4 False False False False False False False False False False False False False
In [195]:
df_who.isna().sum()
Out[195]:
Country                           0
Region                            0
Population                        0
Under15                           0
Over60                            0
FertilityRate                    11
LifeExpectancy                    0
ChildMortality                    0
CellularSubscribers              10
LiteracyRate                     91
GNI                              32
PrimarySchoolEnrollmentMale      93
PrimarySchoolEnrollmentFemale    93
dtype: int64

notnull

In [196]:
df_who.notnull().head()
Out[196]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 True True True True True True True True True False True False False
1 True True True True True True True True True False True False False
2 True True True True True True True True True False True True True
3 True True True True True False True True True False False True True
4 True True True True True True True True True True True True True
In [197]:
df_who.notnull().sum()
Out[197]:
Country                          194
Region                           194
Population                       194
Under15                          194
Over60                           194
FertilityRate                    183
LifeExpectancy                   194
ChildMortality                   194
CellularSubscribers              184
LiteracyRate                     103
GNI                              162
PrimarySchoolEnrollmentMale      101
PrimarySchoolEnrollmentFemale    101
dtype: int64

notna

In [198]:
df_who.notna().head()
Out[198]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 True True True True True True True True True False True False False
1 True True True True True True True True True False True False False
2 True True True True True True True True True False True True True
3 True True True True True False True True True False False True True
4 True True True True True True True True True True True True True
In [199]:
df_who.notna().sum()
Out[199]:
Country                          194
Region                           194
Population                       194
Under15                          194
Over60                           194
FertilityRate                    183
LifeExpectancy                   194
ChildMortality                   194
CellularSubscribers              184
LiteracyRate                     103
GNI                              162
PrimarySchoolEnrollmentMale      101
PrimarySchoolEnrollmentFemale    101
dtype: int64

hasnans

In [200]:
df_who['PrimarySchoolEnrollmentMale'].hasnans
Out[200]:
True
In [201]:
df_who['Region'].hasnans
Out[201]:
False

11.15 Statistical Summary of the Data Frame

In [202]:
df_who= pd.read_csv('WHO_csv.csv')
In [203]:
df_who.sum()
Out[203]:
Country                          AfghanistanAlbaniaAlgeriaAndorraAngolaAntigua ...
Region                           Eastern MediterraneanEuropeAfricaEuropeAfricaA...
Population                                                                 7053835
Under15                                                                    5574.09
Over60                                                                     2165.75
FertilityRate                                                               538.14
LifeExpectancy                                                               13582
ChildMortality                                                              7012.9
CellularSubscribers                                                          17230
LiteracyRate                                                                8622.2
GNI                                                                    2.15799e+06
PrimarySchoolEnrollmentMale                                                 9175.9
PrimarySchoolEnrollmentFemale                                               9052.9
dtype: object

Find count

  • Count non-NA cells for each column or row.
In [204]:
df_who.count()
Out[204]:
Country                          194
Region                           194
Population                       194
Under15                          194
Over60                           194
FertilityRate                    183
LifeExpectancy                   194
ChildMortality                   194
CellularSubscribers              184
LiteracyRate                     103
GNI                              162
PrimarySchoolEnrollmentMale      101
PrimarySchoolEnrollmentFemale    101
dtype: int64

Find Max

In [205]:
df_who.max()
Out[205]:
Country                                 Zimbabwe
Region                           Western Pacific
Population                               1390000
Under15                                    49.99
Over60                                     31.92
FertilityRate                               7.58
LifeExpectancy                                83
ChildMortality                             181.6
CellularSubscribers                       196.41
LiteracyRate                                99.8
GNI                                        86440
PrimarySchoolEnrollmentMale                  100
PrimarySchoolEnrollmentFemale                100
dtype: object

Find Min

In [206]:
df_who.min()
Out[206]:
Country                          Afghanistan
Region                                Africa
Population                                 1
Under15                                13.12
Over60                                  0.81
FertilityRate                           1.26
LifeExpectancy                            47
ChildMortality                           2.2
CellularSubscribers                     2.57
LiteracyRate                            31.1
GNI                                      340
PrimarySchoolEnrollmentMale             37.2
PrimarySchoolEnrollmentFemale           32.5
dtype: object

Find Mean

In [207]:
df_who.mean()
Out[207]:
Population                       36359.974227
Under15                             28.732423
Over60                              11.163660
FertilityRate                        2.940656
LifeExpectancy                      70.010309
ChildMortality                      36.148969
CellularSubscribers                 93.641522
LiteracyRate                        83.710680
GNI                              13320.925926
PrimarySchoolEnrollmentMale         90.850495
PrimarySchoolEnrollmentFemale       89.632673
dtype: float64

Find Median

In [208]:
df_who.median()
Out[208]:
Population                       7790.000
Under15                            28.650
Over60                              8.530
FertilityRate                       2.400
LifeExpectancy                     72.500
ChildMortality                     18.600
CellularSubscribers                97.745
LiteracyRate                       91.800
GNI                              7870.000
PrimarySchoolEnrollmentMale        94.700
PrimarySchoolEnrollmentFemale      95.100
dtype: float64

Find Mode

In [209]:
df = pd.DataFrame({'A': [2, 2, 1, 2, 1, 2, 3]})
In [210]:
df.mode()
Out[210]:
A
0 2

Find Standard Deviation

In [211]:
df_who.std()
Out[211]:
Population                       137903.141241
Under15                              10.534573
Over60                                7.149331
FertilityRate                         1.480984
LifeExpectancy                        9.259075
ChildMortality                       37.992935
CellularSubscribers                  41.400447
LiteracyRate                         17.530645
GNI                               15192.988650
PrimarySchoolEnrollmentMale          11.017147
PrimarySchoolEnrollmentFemale        12.817614
dtype: float64

Find kurtosis

  • Return unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).
  • Normalized by N-1

Method-1

In [212]:
df_who.kurt()
Out[212]:
Population                       77.745290
Under15                          -1.184848
Over60                           -0.528951
FertilityRate                    -0.014065
LifeExpectancy                   -0.504157
ChildMortality                    1.613926
CellularSubscribers              -0.330275
LiteracyRate                      0.390044
GNI                               3.993813
PrimarySchoolEnrollmentMale       6.182614
PrimarySchoolEnrollmentFemale     4.476949
dtype: float64

Method-2

In [213]:
df_who.kurtosis()
Out[213]:
Population                       77.745290
Under15                          -1.184848
Over60                           -0.528951
FertilityRate                    -0.014065
LifeExpectancy                   -0.504157
ChildMortality                    1.613926
CellularSubscribers              -0.330275
LiteracyRate                      0.390044
GNI                               3.993813
PrimarySchoolEnrollmentMale       6.182614
PrimarySchoolEnrollmentFemale     4.476949
dtype: float64

Find Mean absolute deviation (MAD)

  • More Information
  • Return the mean absolute deviation of the values for the requested axis
In [214]:
df_who.mad(axis = 0)
Out[214]:
Population                       46069.916516
Under15                              8.974510
Over60                               6.068596
FertilityRate                        1.221057
LifeExpectancy                       7.657987
ChildMortality                      30.217722
CellularSubscribers                 33.361640
LiteracyRate                        14.488811
GNI                              11298.237311
PrimarySchoolEnrollmentMale          7.785472
PrimarySchoolEnrollmentFemale        9.217390
dtype: float64
In [215]:
df_who.mad()
Out[215]:
Population                       46069.916516
Under15                              8.974510
Over60                               6.068596
FertilityRate                        1.221057
LifeExpectancy                       7.657987
ChildMortality                      30.217722
CellularSubscribers                 33.361640
LiteracyRate                        14.488811
GNI                              11298.237311
PrimarySchoolEnrollmentMale          7.785472
PrimarySchoolEnrollmentFemale        9.217390
dtype: float64

Find standard error of the mean

  • More Information
  • Return unbiased standard error of the mean over requested axis.
In [216]:
df_who.sem()
Out[216]:
Population                       9900.868535
Under15                             0.756338
Over60                              0.513292
FertilityRate                       0.109478
LifeExpectancy                      0.664763
ChildMortality                      2.727734
CellularSubscribers                 3.052081
LiteracyRate                        1.727346
GNI                              1193.673922
PrimarySchoolEnrollmentMale         1.096247
PrimarySchoolEnrollmentFemale       1.275400
dtype: float64

Find variance

In [217]:
df_who.var()
Out[217]:
Population                       1.901728e+10
Under15                          1.109772e+02
Over60                           5.111293e+01
FertilityRate                    2.193315e+00
LifeExpectancy                   8.573046e+01
ChildMortality                   1.443463e+03
CellularSubscribers              1.713997e+03
LiteracyRate                     3.073235e+02
GNI                              2.308269e+08
PrimarySchoolEnrollmentMale      1.213775e+02
PrimarySchoolEnrollmentFemale    1.642912e+02
dtype: float64

Find skew

In [218]:
df_who.skew()
Out[218]:
Population                       8.516265
Under15                          0.208951
Over60                           0.860879
FertilityRate                    0.994529
LifeExpectancy                  -0.672055
ChildMortality                   1.459737
CellularSubscribers             -0.021405
LiteracyRate                    -1.148555
GNI                              1.874357
PrimarySchoolEnrollmentMale     -2.250329
PrimarySchoolEnrollmentFemale   -2.048406
dtype: float64

Find covariance

  • More Information
  • Compute pairwise covariance of columns, excluding NA/null values
In [219]:
df_who.cov()
Out[219]:
Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
Population 1.901728e+10 -86633.150300 11528.784810 -15773.115662 20783.394049 -19252.624120 -329814.152592 85625.375957 -7.587256e+07 -11305.665455 -41746.491059
Under15 -8.663315e+04 110.977235 -62.465627 14.846226 -81.589300 326.225306 -270.964656 -149.579709 -1.125773e+05 -68.624725 -88.824873
Over60 1.152878e+04 -62.465627 51.112927 -7.396885 45.551516 -169.575797 131.784842 70.092246 6.887963e+04 39.013713 50.022550
FertilityRate -1.577312e+04 14.846226 -7.396885 2.193315 -11.608870 49.542671 -38.244626 -22.193942 -1.256083e+04 -10.852196 -14.035533
LifeExpectancy 2.078339e+04 -81.589300 45.551516 -11.608870 85.730463 -325.242476 240.798919 122.090158 9.400513e+04 66.204911 86.894119
ChildMortality -1.925262e+04 326.225306 -169.575797 49.542671 -325.242476 1443.463134 -1013.178148 -573.070230 -3.210987e+05 -251.672829 -354.412195
CellularSubscribers -3.298142e+05 -270.964656 131.784842 -38.244626 240.798919 -1013.178148 1713.997023 437.981394 3.209925e+05 228.451796 293.643108
LiteracyRate 8.562538e+04 -149.579709 70.092246 -22.193942 122.090158 -573.070230 437.981394 307.323512 1.100895e+05 117.775474 163.410289
GNI -7.587256e+07 -112577.288337 68879.633552 -12560.833258 94005.125374 -321098.725558 320992.491398 110089.531746 2.308269e+08 71154.756410 93468.347863
PrimarySchoolEnrollmentMale -1.130567e+04 -68.624725 39.013713 -10.852196 66.204911 -251.672829 228.451796 117.775474 7.115476e+04 121.377525 133.144234
PrimarySchoolEnrollmentFemale -4.174649e+04 -88.824873 50.022550 -14.035533 86.894119 -354.412195 293.643108 163.410289 9.346835e+04 133.144234 164.291222

Find correlation

  • Compute pairwise correlation of columns, excluding NA/null values
  • More Information
In [220]:
from IPython.display import YouTubeVideo
YouTubeVideo('sCkS-0kIRCE',width=900, height=500)
Out[220]:
In [221]:
df_who.corr(method='pearson')
Out[221]:
Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
Population 1.000000 -0.059634 0.011693 -0.075156 0.016277 -0.003675 -0.056345 0.035232 -0.033255 -0.024105 -0.076504
Under15 -0.059634 1.000000 -0.829390 0.936096 -0.836467 0.815076 -0.613113 -0.769803 -0.685432 -0.562826 -0.626167
Over60 0.011693 -0.829390 1.000000 -0.699413 0.688129 -0.624303 0.439898 0.596493 0.622878 0.460950 0.508001
FertilityRate -0.075156 0.936096 -0.699413 1.000000 -0.839840 0.864038 -0.621337 -0.802261 -0.548732 -0.633958 -0.702626
LifeExpectancy 0.016277 -0.836467 0.688129 -0.839840 1.000000 -0.924564 0.623251 0.725924 0.665786 0.630538 0.711334
ChildMortality -0.003675 0.815076 -0.624303 0.864038 -0.924564 1.000000 -0.637557 -0.779498 -0.544689 -0.602334 -0.729074
CellularSubscribers -0.056345 -0.613113 0.439898 -0.621337 0.623251 -0.637557 1.000000 0.574928 0.537116 0.545671 0.603955
LiteracyRate 0.035232 -0.769803 0.596493 -0.802261 0.725924 -0.779498 0.574928 1.000000 0.479062 0.537777 0.652675
GNI -0.033255 -0.685432 0.622878 -0.548732 0.665786 -0.544689 0.537116 0.479062 1.000000 0.378256 0.424768
PrimarySchoolEnrollmentMale -0.024105 -0.562826 0.460950 -0.633958 0.630538 -0.602334 0.545671 0.537777 0.378256 1.000000 0.942857
PrimarySchoolEnrollmentFemale -0.076504 -0.626167 0.508001 -0.702626 0.711334 -0.729074 0.603955 0.652675 0.424768 0.942857 1.000000

other methods are available in Pandas are;

  • pearson : standard correlation coefficient
  • kendall : Kendall Tau correlation coefficient
  • spearman : Spearman rank correlation

Find corrwith

  • More Information
  • Compute pairwise correlation between rows or columns of two DataFrame objects.
In [222]:
df1 = pd.DataFrame({"A":[1, 5, 7, 8],  
                    "B":[5, 8, 4, 3], 
                    "C":[10, 4, 9, 3]})
In [223]:
df2 = pd.DataFrame({"A":[5, 3, 6, 4], 
                    "B":[11, 2, 4, 3], 
                    "C":[4, 3, 8, 5]})
  • To find the correlation among the
  • columns of dfcw1 and dfcw2 along the row axis
In [224]:
df1.corrwith(df2, axis = 0)
Out[224]:
A   -0.041703
B   -0.151186
C    0.395437
dtype: float64
In [225]:
df1.corrwith(df2, axis = 1)
Out[225]:
0   -0.195254
1   -0.970725
2    0.993399
3    0.000000
dtype: float64

Find quantile

In [226]:
df_who.quantile([0.25,0.50,0.75,0.100], numeric_only=True)
Out[226]:
Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0.25 1695.75 18.7175 5.2000 1.835 64.0 8.425 63.5675 71.60 2335.0 87.7 87.3
0.50 7790.00 28.6500 8.5300 2.400 72.5 18.600 97.7450 91.80 7870.0 94.7 95.1
0.75 24535.25 37.7525 16.6875 3.905 76.0 55.975 120.8050 97.85 17557.5 98.1 97.9
0.10 188.30 15.0150 4.4430 1.470 56.0 4.030 40.6360 57.04 1186.0 76.7 72.1

Apply Count Fucntion in the row

In [227]:
from IPython.display import YouTubeVideo
YouTubeVideo('PtO3t6ynH-8',width=900, height=500)
Out[227]:
In [228]:
spl2= pd.read_csv('Scopus_1926-1950.csv',encoding='latin-1')
# Preprocessing
# Remove Space between Author(s)ID
spl2['Author(s)ID'] = spl2['Author(s)ID'].str.replace(" ","")
# Remove Author(s)ID end with ;
spl2['Author(s)ID'] = spl2['Author(s)ID'].str.rstrip(';')

spl2 = spl2.join(spl2['Author(s)ID'].str.split(';', expand=True).add_prefix('Author_'))
spl2.head()
Out[228]:
Authors Author(s)ID Title Year Sourcetitle Volume Issue Art.No. Pagestart Pageend ... DocumentType PublicationStage AccessType Source EID Author_0 Author_1 Author_2 Author_3 Author_4
0 Bhatnagar, S.S., Prasad, M., Singh, B. 55429686600;57197403605;55480056700 Einige physikalische Eigenschaften von einwert... 1926 Kolloid-Zeitschrift 38.0 3 NaN 218.0 222.0 ... Article Final NaN Scopus 2-s2.0-34347100553 55429686600 57197403605 55480056700 None None
1 Malik, K.S. 16664426200 Viskositäten einwertiger Salze der höheren Fet... 1926 Kolloid-Zeitschrift 39.0 4 NaN 322.0 324.0 ... Article Final NaN Scopus 2-s2.0-34347109543 16664426200 None None None None
2 Christensen, J. 57190353901 THE NEW AFGHANISTAN. 1926 The Muslim World 16.0 4 NaN 349.0 356.0 ... Article Final NaN Scopus 2-s2.0-84980098079 57190353901 None None None None
3 OSMASTON, B.B. 57190078988 The Birda of Ladakh 1926 Ibis 68.0 2 NaN 446.0 448.0 ... Letter Final NaN Scopus 2-s2.0-84977249430 57190078988 None None None None
4 Bakhsh, J.A. 57190457552 THE STORY OF MY CONVERSION 1926 The Muslim World 16.0 1 NaN 79.0 84.0 ... Article Final NaN Scopus 2-s2.0-84894912026 57190457552 None None None None

5 rows × 24 columns

In [229]:
spl2.keys()
Out[229]:
Index(['Authors', 'Author(s)ID', 'Title', 'Year', 'Sourcetitle', 'Volume',
       'Issue', 'Art.No.', 'Pagestart', 'Pageend', 'Pagecount', 'Citedby',
       'DOI', 'Link', 'DocumentType', 'PublicationStage', 'AccessType',
       'Source', 'EID', 'Author_0', 'Author_1', 'Author_2', 'Author_3',
       'Author_4'],
      dtype='object')
In [230]:
spl2['AuthorCounts']=spl2.iloc[:,19:].count(axis=1)
spl2.head()
Out[230]:
Authors Author(s)ID Title Year Sourcetitle Volume Issue Art.No. Pagestart Pageend ... PublicationStage AccessType Source EID Author_0 Author_1 Author_2 Author_3 Author_4 AuthorCounts
0 Bhatnagar, S.S., Prasad, M., Singh, B. 55429686600;57197403605;55480056700 Einige physikalische Eigenschaften von einwert... 1926 Kolloid-Zeitschrift 38.0 3 NaN 218.0 222.0 ... Final NaN Scopus 2-s2.0-34347100553 55429686600 57197403605 55480056700 None None 3
1 Malik, K.S. 16664426200 Viskositäten einwertiger Salze der höheren Fet... 1926 Kolloid-Zeitschrift 39.0 4 NaN 322.0 324.0 ... Final NaN Scopus 2-s2.0-34347109543 16664426200 None None None None 1
2 Christensen, J. 57190353901 THE NEW AFGHANISTAN. 1926 The Muslim World 16.0 4 NaN 349.0 356.0 ... Final NaN Scopus 2-s2.0-84980098079 57190353901 None None None None 1
3 OSMASTON, B.B. 57190078988 The Birda of Ladakh 1926 Ibis 68.0 2 NaN 446.0 448.0 ... Final NaN Scopus 2-s2.0-84977249430 57190078988 None None None None 1
4 Bakhsh, J.A. 57190457552 THE STORY OF MY CONVERSION 1926 The Muslim World 16.0 1 NaN 79.0 84.0 ... Final NaN Scopus 2-s2.0-84894912026 57190457552 None None None None 1

5 rows × 25 columns

Describe

In [231]:
from IPython.display import YouTubeVideo
YouTubeVideo('B-r9VuK80dk',width=900, height=500)
Out[231]:
  • More Information
  • Generates descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.
In [232]:
df_who.describe()
Out[232]:
Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
count 1.940000e+02 194.000000 194.000000 183.000000 194.000000 194.000000 184.000000 103.000000 162.000000 101.000000 101.000000
mean 3.635997e+04 28.732423 11.163660 2.940656 70.010309 36.148969 93.641522 83.710680 13320.925926 90.850495 89.632673
std 1.379031e+05 10.534573 7.149331 1.480984 9.259075 37.992935 41.400447 17.530645 15192.988650 11.017147 12.817614
min 1.000000e+00 13.120000 0.810000 1.260000 47.000000 2.200000 2.570000 31.100000 340.000000 37.200000 32.500000
25% 1.695750e+03 18.717500 5.200000 1.835000 64.000000 8.425000 63.567500 71.600000 2335.000000 87.700000 87.300000
50% 7.790000e+03 28.650000 8.530000 2.400000 72.500000 18.600000 97.745000 91.800000 7870.000000 94.700000 95.100000
75% 2.453525e+04 37.752500 16.687500 3.905000 76.000000 55.975000 120.805000 97.850000 17557.500000 98.100000 97.900000
max 1.390000e+06 49.990000 31.920000 7.580000 83.000000 181.600000 196.410000 99.800000 86440.000000 100.000000 100.000000
In [233]:
df_who.describe(include='all')
Out[233]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
count 194 194 1.940000e+02 194.000000 194.000000 183.000000 194.000000 194.000000 184.000000 103.000000 162.000000 101.000000 101.000000
unique 194 6 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
top Israel Europe NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
freq 1 53 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
mean NaN NaN 3.635997e+04 28.732423 11.163660 2.940656 70.010309 36.148969 93.641522 83.710680 13320.925926 90.850495 89.632673
std NaN NaN 1.379031e+05 10.534573 7.149331 1.480984 9.259075 37.992935 41.400447 17.530645 15192.988650 11.017147 12.817614
min NaN NaN 1.000000e+00 13.120000 0.810000 1.260000 47.000000 2.200000 2.570000 31.100000 340.000000 37.200000 32.500000
25% NaN NaN 1.695750e+03 18.717500 5.200000 1.835000 64.000000 8.425000 63.567500 71.600000 2335.000000 87.700000 87.300000
50% NaN NaN 7.790000e+03 28.650000 8.530000 2.400000 72.500000 18.600000 97.745000 91.800000 7870.000000 94.700000 95.100000
75% NaN NaN 2.453525e+04 37.752500 16.687500 3.905000 76.000000 55.975000 120.805000 97.850000 17557.500000 98.100000 97.900000
max NaN NaN 1.390000e+06 49.990000 31.920000 7.580000 83.000000 181.600000 196.410000 99.800000 86440.000000 100.000000 100.000000
In [234]:
df_who['Population'].describe()
Out[234]:
count    1.940000e+02
mean     3.635997e+04
std      1.379031e+05
min      1.000000e+00
25%      1.695750e+03
50%      7.790000e+03
75%      2.453525e+04
max      1.390000e+06
Name: Population, dtype: float64
In [235]:
df_who[['Population','Over60']].describe()
Out[235]:
Population Over60
count 1.940000e+02 194.000000
mean 3.635997e+04 11.163660
std 1.379031e+05 7.149331
min 1.000000e+00 0.810000
25% 1.695750e+03 5.200000
50% 7.790000e+03 8.530000
75% 2.453525e+04 16.687500
max 1.390000e+06 31.920000
In [236]:
df_who.describe(include=['object'])
Out[236]:
Country Region
count 194 194
unique 194 6
top Israel Europe
freq 1 53
In [237]:
df_who.describe().loc['min':'max']
Out[237]:
Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
min 1.00 13.1200 0.8100 1.260 47.0 2.200 2.5700 31.10 340.0 37.2 32.5
25% 1695.75 18.7175 5.2000 1.835 64.0 8.425 63.5675 71.60 2335.0 87.7 87.3
50% 7790.00 28.6500 8.5300 2.400 72.5 18.600 97.7450 91.80 7870.0 94.7 95.1
75% 24535.25 37.7525 16.6875 3.905 76.0 55.975 120.8050 97.85 17557.5 98.1 97.9
max 1390000.00 49.9900 31.9200 7.580 83.0 181.600 196.4100 99.80 86440.0 100.0 100.0
In [238]:
df_who.describe().loc['min':'max', 'Under15':'ChildMortality']
Out[238]:
Under15 Over60 FertilityRate LifeExpectancy ChildMortality
min 13.1200 0.8100 1.260 47.0 2.200
25% 18.7175 5.2000 1.835 64.0 8.425
50% 28.6500 8.5300 2.400 72.5 18.600
75% 37.7525 16.6875 3.905 76.0 55.975
max 49.9900 31.9200 7.580 83.0 181.600

11.16 Pandas and Pandas Profiling to analyze data

  • Let's say that you've got a new dataset, and you want to quickly explore it without too much work. There's a separate package called pandas-profiling that is designed for this purpose.
  • First you have to install it using conda or pip. Once that's done, you import pandas_profiling:
  • You also install pandas-profiling in Jupter Notebook using following command.
In [239]:
from IPython.display import YouTubeVideo
YouTubeVideo('EaHWjkEPHr8',width=900, height=500)
Out[239]:

Pandas Profiling Installation

In [240]:
import sys
!{sys.executable} -m pip install pandas-profiling
Requirement already satisfied: pandas-profiling in c:\programdata\anaconda3\lib\site-packages (2.3.0)
Requirement already satisfied: jinja2>=2.8 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (2.10.1)
Requirement already satisfied: phik>=0.9.8 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (0.9.8)
Requirement already satisfied: astropy in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (3.2.1)
Requirement already satisfied: confuse>=1.0.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (1.0.0)
Requirement already satisfied: pandas>=0.19 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (0.25.0)
Requirement already satisfied: htmlmin>=0.1.12 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (0.1.12)
Requirement already satisfied: matplotlib>=1.4 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (3.1.0)
Requirement already satisfied: missingno>=0.4.2 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (0.4.2)
Requirement already satisfied: MarkupSafe>=0.23 in c:\programdata\anaconda3\lib\site-packages (from jinja2>=2.8->pandas-profiling) (1.1.1)
Requirement already satisfied: numpy>=1.15.4 in c:\programdata\anaconda3\lib\site-packages (from phik>=0.9.8->pandas-profiling) (1.16.4)
Requirement already satisfied: numba>=0.38.1 in c:\programdata\anaconda3\lib\site-packages (from phik>=0.9.8->pandas-profiling) (0.44.1)
Requirement already satisfied: pytest>=4.0.2 in c:\programdata\anaconda3\lib\site-packages (from phik>=0.9.8->pandas-profiling) (5.0.1)
Requirement already satisfied: scipy>=1.1.0 in c:\programdata\anaconda3\lib\site-packages (from phik>=0.9.8->pandas-profiling) (1.2.1)
Requirement already satisfied: jupyter-client>=5.2.3 in c:\programdata\anaconda3\lib\site-packages (from phik>=0.9.8->pandas-profiling) (5.3.1)
Requirement already satisfied: pytest-pylint>=0.13.0 in c:\programdata\anaconda3\lib\site-packages (from phik>=0.9.8->pandas-profiling) (0.14.1)
Requirement already satisfied: nbconvert>=5.3.1 in c:\programdata\anaconda3\lib\site-packages (from phik>=0.9.8->pandas-profiling) (5.5.0)
Requirement already satisfied: pyyaml in c:\programdata\anaconda3\lib\site-packages (from confuse>=1.0.0->pandas-profiling) (5.1.1)
Requirement already satisfied: python-dateutil>=2.6.1 in c:\programdata\anaconda3\lib\site-packages (from pandas>=0.19->pandas-profiling) (2.8.0)
Requirement already satisfied: pytz>=2017.2 in c:\programdata\anaconda3\lib\site-packages (from pandas>=0.19->pandas-profiling) (2019.1)
Requirement already satisfied: cycler>=0.10 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=1.4->pandas-profiling) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=1.4->pandas-profiling) (1.1.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=1.4->pandas-profiling) (2.4.0)
Requirement already satisfied: seaborn in c:\programdata\anaconda3\lib\site-packages (from missingno>=0.4.2->pandas-profiling) (0.9.0)
Requirement already satisfied: llvmlite>=0.29.0 in c:\programdata\anaconda3\lib\site-packages (from numba>=0.38.1->phik>=0.9.8->pandas-profiling) (0.29.0)
Requirement already satisfied: py>=1.5.0 in c:\programdata\anaconda3\lib\site-packages (from pytest>=4.0.2->phik>=0.9.8->pandas-profiling) (1.8.0)
Requirement already satisfied: packaging in c:\programdata\anaconda3\lib\site-packages (from pytest>=4.0.2->phik>=0.9.8->pandas-profiling) (19.0)
Requirement already satisfied: attrs>=17.4.0 in c:\programdata\anaconda3\lib\site-packages (from pytest>=4.0.2->phik>=0.9.8->pandas-profiling) (19.1.0)
Requirement already satisfied: more-itertools>=4.0.0 in c:\programdata\anaconda3\lib\site-packages (from pytest>=4.0.2->phik>=0.9.8->pandas-profiling) (7.0.0)
Requirement already satisfied: atomicwrites>=1.0 in c:\programdata\anaconda3\lib\site-packages (from pytest>=4.0.2->phik>=0.9.8->pandas-profiling) (1.3.0)
Requirement already satisfied: pluggy<1.0,>=0.12 in c:\programdata\anaconda3\lib\site-packages (from pytest>=4.0.2->phik>=0.9.8->pandas-profiling) (0.12.0)
Requirement already satisfied: importlib-metadata>=0.12 in c:\programdata\anaconda3\lib\site-packages (from pytest>=4.0.2->phik>=0.9.8->pandas-profiling) (0.17)
Requirement already satisfied: wcwidth in c:\programdata\anaconda3\lib\site-packages (from pytest>=4.0.2->phik>=0.9.8->pandas-profiling) (0.1.7)
Requirement already satisfied: colorama in c:\programdata\anaconda3\lib\site-packages (from pytest>=4.0.2->phik>=0.9.8->pandas-profiling) (0.4.1)
Requirement already satisfied: tornado>=4.1 in c:\programdata\anaconda3\lib\site-packages (from jupyter-client>=5.2.3->phik>=0.9.8->pandas-profiling) (6.0.3)
Requirement already satisfied: pyzmq>=13 in c:\programdata\anaconda3\lib\site-packages (from jupyter-client>=5.2.3->phik>=0.9.8->pandas-profiling) (18.0.0)
Requirement already satisfied: traitlets in c:\programdata\anaconda3\lib\site-packages (from jupyter-client>=5.2.3->phik>=0.9.8->pandas-profiling) (4.3.2)
Requirement already satisfied: jupyter-core in c:\programdata\anaconda3\lib\site-packages (from jupyter-client>=5.2.3->phik>=0.9.8->pandas-profiling) (4.5.0)
Requirement already satisfied: pylint>=1.4.5 in c:\programdata\anaconda3\lib\site-packages (from pytest-pylint>=0.13.0->phik>=0.9.8->pandas-profiling) (2.3.1)
Requirement already satisfied: six in c:\programdata\anaconda3\lib\site-packages (from pytest-pylint>=0.13.0->phik>=0.9.8->pandas-profiling) (1.12.0)
Requirement already satisfied: nbformat>=4.4 in c:\programdata\anaconda3\lib\site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas-profiling) (4.4.0)
Requirement already satisfied: entrypoints>=0.2.2 in c:\programdata\anaconda3\lib\site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas-profiling) (0.3)
Requirement already satisfied: pandocfilters>=1.4.1 in c:\programdata\anaconda3\lib\site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas-profiling) (1.4.2)
Requirement already satisfied: testpath in c:\programdata\anaconda3\lib\site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas-profiling) (0.4.2)
Requirement already satisfied: mistune>=0.8.1 in c:\programdata\anaconda3\lib\site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas-profiling) (0.8.4)
Requirement already satisfied: pygments in c:\programdata\anaconda3\lib\site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas-profiling) (2.4.2)
Requirement already satisfied: defusedxml in c:\programdata\anaconda3\lib\site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas-profiling) (0.6.0)
Requirement already satisfied: bleach in c:\programdata\anaconda3\lib\site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas-profiling) (3.1.0)
Requirement already satisfied: setuptools in c:\programdata\anaconda3\lib\site-packages (from kiwisolver>=1.0.1->matplotlib>=1.4->pandas-profiling) (41.0.1)
Requirement already satisfied: zipp>=0.5 in c:\programdata\anaconda3\lib\site-packages (from importlib-metadata>=0.12->pytest>=4.0.2->phik>=0.9.8->pandas-profiling) (0.5.1)
Requirement already satisfied: decorator in c:\programdata\anaconda3\lib\site-packages (from traitlets->jupyter-client>=5.2.3->phik>=0.9.8->pandas-profiling) (4.4.0)
Requirement already satisfied: ipython-genutils in c:\programdata\anaconda3\lib\site-packages (from traitlets->jupyter-client>=5.2.3->phik>=0.9.8->pandas-profiling) (0.2.0)
Requirement already satisfied: astroid<3,>=2.2.0 in c:\programdata\anaconda3\lib\site-packages (from pylint>=1.4.5->pytest-pylint>=0.13.0->phik>=0.9.8->pandas-profiling) (2.2.5)
Requirement already satisfied: isort<5,>=4.2.5 in c:\programdata\anaconda3\lib\site-packages (from pylint>=1.4.5->pytest-pylint>=0.13.0->phik>=0.9.8->pandas-profiling) (4.3.21)
Requirement already satisfied: mccabe<0.7,>=0.6 in c:\programdata\anaconda3\lib\site-packages (from pylint>=1.4.5->pytest-pylint>=0.13.0->phik>=0.9.8->pandas-profiling) (0.6.1)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in c:\programdata\anaconda3\lib\site-packages (from nbformat>=4.4->nbconvert>=5.3.1->phik>=0.9.8->pandas-profiling) (3.0.1)
Requirement already satisfied: webencodings in c:\programdata\anaconda3\lib\site-packages (from bleach->nbconvert>=5.3.1->phik>=0.9.8->pandas-profiling) (0.5.1)
Requirement already satisfied: typed-ast>=1.3.0; implementation_name == "cpython" in c:\programdata\anaconda3\lib\site-packages (from astroid<3,>=2.2.0->pylint>=1.4.5->pytest-pylint>=0.13.0->phik>=0.9.8->pandas-profiling) (1.4.0)
Requirement already satisfied: lazy-object-proxy in c:\programdata\anaconda3\lib\site-packages (from astroid<3,>=2.2.0->pylint>=1.4.5->pytest-pylint>=0.13.0->phik>=0.9.8->pandas-profiling) (1.4.1)
Requirement already satisfied: wrapt in c:\programdata\anaconda3\lib\site-packages (from astroid<3,>=2.2.0->pylint>=1.4.5->pytest-pylint>=0.13.0->phik>=0.9.8->pandas-profiling) (1.11.2)
Requirement already satisfied: pyrsistent>=0.14.0 in c:\programdata\anaconda3\lib\site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.4->nbconvert>=5.3.1->phik>=0.9.8->pandas-profiling) (0.14.11)

Then, simply run the ProfileReport() function and pass it any DataFrame. It returns an interactive HTML report:

  • The first section is an overview of the dataset and a list of possible issues with the data.
  • The next section gives a summary of each column. You can click "toggle details" for even more information.
  • The third section shows a heatmap of the correlation between columns.
  • And the fourth section shows the head of the dataset.
In [241]:
import pandas as pd
import pandas_profiling
df_who= pd.read_csv('WHO_csv.csv')
In [242]:
pandas_profiling.ProfileReport(df_who)
Out[242]:

11.17 Slicing Subsets of Rows and Columns in Pandas

In [243]:
from IPython.display import YouTubeVideo
YouTubeVideo('zxqjeyKP2Tk',width=900, height=500)
Out[243]:
In [244]:
from IPython.display import YouTubeVideo
YouTubeVideo('OYZNk7Z9s6I',width=900, height=500)
Out[244]:
In [245]:
from IPython.display import YouTubeVideo
YouTubeVideo('xvpNA7bC8cs',width=900, height=500)
Out[245]:
In [246]:
df_who= pd.read_csv('WHO_csv.csv')
df_who.head()
Out[246]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
In [247]:
df_who.at[1, 'Country']
Out[247]:
'Albania'
In [248]:
df_who.at[3, 'Population']=780
In [249]:
df_who.head()
Out[249]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 780 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
  • Get value within a Series
In [250]:
df_who.loc[3].at['Population']
Out[250]:
780

iat

  • More Information
  • Access a single value for a row/column pair by integer position.
  • Similar to iloc, in that both provide integer-based lookups.
  • Use iat if you only need to get or set a single value in a DataFrame or Series.
  • Get value at specified row/column pair
In [251]:
df_who= pd.read_csv('WHO_csv.csv')
df_who.head()
Out[251]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
In [252]:
df_who.iat[0, 0]
Out[252]:
'Afghanistan'
In [253]:
df_who.iat[2, 3]=880
In [254]:
df_who.head()
Out[254]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 880.00 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
  • Get value within a series
In [255]:
df_who.loc[0].iat[0]
Out[255]:
'Afghanistan'

ix

  • A primarily label-location based indexer, with integer position fallback.
  • Deprecated
In [256]:
df_who= pd.read_csv('WHO_csv.csv')
df_who.head()
Out[256]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
In [257]:
df_who.ix[10:20, ['Country','Over60']]
Out[257]:
Country Over60
10 Azerbaijan 8.24
11 Bahamas 11.24
12 Bahrain 3.38
13 Bangladesh 6.89
14 Barbados 15.78
15 Belarus 19.31
16 Belgium 23.81
17 Belize 5.74
18 Benin 4.54
19 Bhutan 6.90
20 Bolivia (Plurinational State of) 7.28
In [258]:
df_who.ix[10:15,]
Out[258]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
10 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
12 Bahrain Eastern Mediterranean 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
13 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
15 Belarus Europe 9405 15.10 19.31 1.47 71 5.2 111.88 NaN 14460.0 NaN NaN
In [259]:
df_who.ix[:4, 1:3]
Out[259]:
Region Population
0 Eastern Mediterranean 29825
1 Europe 3162
2 Africa 38482
3 Europe 78
4 Africa 20821
In [260]:
df_who.ix[[1,3,5], [1,8]]
Out[260]:
Region CellularSubscribers
1 Europe 96.39
3 Europe 75.49
5 Americas 196.41

filter

In [261]:
df_who= pd.read_csv('WHO_csv.csv')
df_who.head()
Out[261]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
In [262]:
df_who.filter(['Country', 'CellularSubscribers','Population']).head()
Out[262]:
Country CellularSubscribers Population
0 Afghanistan 54.26 29825
1 Albania 96.39 3162
2 Algeria 98.99 38482
3 Andorra 75.49 78
4 Angola 48.38 20821
In [263]:
df_who.filter(regex ='[cC]').head()
Out[263]:
Country LifeExpectancy ChildMortality CellularSubscribers LiteracyRate PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan 60 98.5 54.26 NaN NaN NaN
1 Albania 74 16.7 96.39 NaN NaN NaN
2 Algeria 73 20.0 98.99 NaN 98.2 96.4
3 Andorra 82 3.2 75.49 NaN 78.4 79.4
4 Angola 51 163.5 48.38 70.1 93.1 78.2
In [264]:
df_who.filter(like ='Enr').head()
Out[264]:
PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 NaN NaN
1 NaN NaN
2 98.2 96.4
3 78.4 79.4
4 93.1 78.2

Subset of Each Rows and Columns

In [265]:
df_who= pd.read_csv('WHO_csv.csv')
df_who[:]
Out[265]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
6 Argentina Americas 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN
7 Armenia Europe 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN
8 Australia Western Pacific 23050 18.95 19.46 1.89 82 4.9 108.34 NaN 38110.0 96.9 97.5
9 Austria Europe 8464 14.51 23.52 1.44 81 4.0 154.78 NaN 42050.0 NaN NaN
10 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
12 Bahrain Eastern Mediterranean 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
13 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
15 Belarus Europe 9405 15.10 19.31 1.47 71 5.2 111.88 NaN 14460.0 NaN NaN
16 Belgium Europe 11060 16.88 23.81 1.85 80 4.2 116.61 NaN 39190.0 98.9 99.2
17 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
18 Benin Africa 10051 42.95 4.54 5.01 57 89.5 85.33 42.4 1620.0 NaN NaN
19 Bhutan South-East Asia 742 28.53 6.90 2.32 67 44.6 65.58 NaN 5570.0 88.3 91.5
20 Bolivia (Plurinational State of) Americas 10496 35.23 7.28 3.31 67 41.4 82.82 NaN 4890.0 91.2 91.5
21 Bosnia and Herzegovina Europe 3834 16.35 20.52 1.26 76 6.7 84.52 97.9 9190.0 86.5 88.4
22 Botswana Africa 2004 33.75 5.63 2.71 66 53.3 142.82 84.5 14550.0 NaN NaN
23 Brazil Americas 199000 24.56 10.81 1.82 74 14.4 124.26 NaN 11420.0 NaN NaN
24 Brunei Darussalam Western Pacific 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
25 Bulgaria Europe 7278 13.53 26.11 1.51 74 12.1 140.68 NaN 14160.0 99.3 99.7
26 Burkina Faso Africa 16460 45.66 3.88 5.78 56 102.4 45.27 NaN 1300.0 60.7 55.9
27 Burundi Africa 9850 44.20 3.87 6.21 53 104.3 22.33 67.2 610.0 NaN NaN
28 Cambodia Western Pacific 14865 31.23 7.67 2.93 65 39.7 96.17 NaN 2230.0 96.4 95.4
29 Cameroon Africa 21700 43.08 4.89 4.94 53 94.9 52.35 NaN 2330.0 99.6 87.4
... ... ... ... ... ... ... ... ... ... ... ... ... ...
164 Suriname Americas 535 27.83 9.55 2.32 72 20.8 178.88 94.7 NaN NaN NaN
165 Swaziland Africa 1231 38.05 5.34 3.48 50 79.7 63.70 87.4 5930.0 NaN NaN
166 Sweden Europe 9511 16.71 25.32 1.93 82 2.9 118.57 NaN 42200.0 99.7 99.0
167 Switzerland Europe 7997 14.79 23.25 1.51 83 4.3 131.43 NaN 52570.0 98.9 99.5
168 Syrian Arab Republic Eastern Mediterranean 21890 35.35 6.09 3.04 75 15.1 63.17 83.4 NaN NaN NaN
169 Tajikistan Europe 8009 35.75 4.80 3.81 68 58.3 90.64 99.7 2300.0 99.5 96.0
170 Thailand South-East Asia 66785 18.47 13.96 1.43 74 13.2 111.63 NaN 8360.0 NaN NaN
171 The former Yugoslav Republic of Macedonia Europe 2106 16.89 17.56 1.44 75 7.4 107.24 97.3 11090.0 97.3 99.2
172 Timor-Leste South-East Asia 1114 46.33 5.16 6.11 64 56.7 53.23 58.3 NaN 86.2 85.6
173 Togo Africa 6643 41.89 4.44 4.75 56 95.5 50.45 NaN 1040.0 NaN NaN
174 Tonga Western Pacific 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
175 Trinidad and Tobago Americas 1337 20.73 13.18 1.80 71 20.7 135.64 98.8 NaN 97.7 97.0
176 Tunisia Eastern Mediterranean 10875 23.22 10.49 2.04 76 16.1 116.93 NaN 9030.0 NaN NaN
177 Turkey Europe 73997 26.00 10.56 2.08 76 14.2 88.70 NaN 16940.0 99.5 98.3
178 Turkmenistan Europe 5173 28.65 6.30 2.38 63 52.8 68.77 99.6 8690.0 NaN NaN
179 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
180 Uganda Africa 36346 48.54 3.72 6.06 56 68.9 48.38 73.2 1310.0 89.7 92.3
181 Ukraine Europe 45530 14.18 20.76 1.45 71 10.7 122.98 99.7 7040.0 90.8 91.5
182 United Arab Emirates Eastern Mediterranean 9206 14.41 0.81 1.84 76 8.4 148.62 NaN 47890.0 NaN NaN
183 United Kingdom Europe 62783 17.54 23.06 1.90 80 4.8 130.75 NaN 36010.0 99.8 99.6
184 United Republic of Tanzania Africa 47783 44.85 4.89 5.36 59 54.0 55.53 73.2 1500.0 NaN NaN
185 United States of America Americas 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1
186 Uruguay Americas 3395 22.05 18.59 2.07 77 7.2 140.75 98.1 14640.0 NaN NaN
187 Uzbekistan Europe 28541 28.90 6.38 2.38 68 39.6 91.65 99.4 3420.0 93.3 91.0
188 Vanuatu Western Pacific 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN
189 Venezuela (Bolivarian Republic of) Americas 29955 28.84 9.17 2.44 75 15.3 97.78 NaN 12430.0 94.7 95.1
190 Viet Nam Western Pacific 90796 22.87 9.32 1.79 75 23.0 143.39 93.2 3250.0 NaN NaN
191 Yemen Eastern Mediterranean 23852 40.72 4.54 4.35 64 60.0 47.05 63.9 2170.0 85.5 70.5
192 Zambia Africa 14075 46.73 3.95 5.77 55 88.5 60.59 71.2 1490.0 91.4 93.9
193 Zimbabwe Africa 13724 40.24 5.68 3.64 54 89.8 72.13 92.2 NaN NaN NaN

194 rows × 13 columns

Subset of Specific Rows

In [266]:
df_who[10:14]
Out[266]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
10 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
12 Bahrain Eastern Mediterranean 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
13 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN

loc and iloc

  • loc: Access a group of rows and columns by label(s) or a boolean array.
  • iloc: Purely integer-location based indexing for selection by position.
  • More Information

Subset Specific Rows(Using iloc)

In [267]:
df_who.iloc[10:14]
Out[267]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
10 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
12 Bahrain Eastern Mediterranean 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
13 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN

Subset Specific Rows(Using loc)

In [268]:
df_who.loc[[2,4,6]]
Out[268]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
6 Argentina Americas 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN

Subset of Specific Columns(Using iloc)

In [269]:
df_who.iloc[:,1:4]
Out[269]:
Region Population Under15
0 Eastern Mediterranean 29825 47.42
1 Europe 3162 21.33
2 Africa 38482 27.42
3 Europe 78 15.20
4 Africa 20821 47.58
5 Americas 89 25.96
6 Americas 41087 24.42
7 Europe 2969 20.34
8 Western Pacific 23050 18.95
9 Europe 8464 14.51
10 Europe 9309 22.25
11 Americas 372 21.62
12 Eastern Mediterranean 1318 20.16
13 South-East Asia 155000 30.57
14 Americas 283 18.99
15 Europe 9405 15.10
16 Europe 11060 16.88
17 Americas 324 34.40
18 Africa 10051 42.95
19 South-East Asia 742 28.53
20 Americas 10496 35.23
21 Europe 3834 16.35
22 Africa 2004 33.75
23 Americas 199000 24.56
24 Western Pacific 412 25.75
25 Europe 7278 13.53
26 Africa 16460 45.66
27 Africa 9850 44.20
28 Western Pacific 14865 31.23
29 Africa 21700 43.08
... ... ... ...
164 Americas 535 27.83
165 Africa 1231 38.05
166 Europe 9511 16.71
167 Europe 7997 14.79
168 Eastern Mediterranean 21890 35.35
169 Europe 8009 35.75
170 South-East Asia 66785 18.47
171 Europe 2106 16.89
172 South-East Asia 1114 46.33
173 Africa 6643 41.89
174 Western Pacific 105 37.33
175 Americas 1337 20.73
176 Eastern Mediterranean 10875 23.22
177 Europe 73997 26.00
178 Europe 5173 28.65
179 Western Pacific 10 30.61
180 Africa 36346 48.54
181 Europe 45530 14.18
182 Eastern Mediterranean 9206 14.41
183 Europe 62783 17.54
184 Africa 47783 44.85
185 Americas 318000 19.63
186 Americas 3395 22.05
187 Europe 28541 28.90
188 Western Pacific 247 37.37
189 Americas 29955 28.84
190 Western Pacific 90796 22.87
191 Eastern Mediterranean 23852 40.72
192 Africa 14075 46.73
193 Africa 13724 40.24

194 rows × 3 columns

Subset Specific Rows and Columns (Using iloc)

In [270]:
df_who.iloc[14:25, 1:4]
Out[270]:
Region Population Under15
14 Americas 283 18.99
15 Europe 9405 15.10
16 Europe 11060 16.88
17 Americas 324 34.40
18 Africa 10051 42.95
19 South-East Asia 742 28.53
20 Americas 10496 35.23
21 Europe 3834 16.35
22 Africa 2004 33.75
23 Americas 199000 24.56
24 Western Pacific 412 25.75

Subset Specific Column(Using Variable Name)-Single Variable

Method-1

In [271]:
df_who.Country
Out[271]:
0                             Afghanistan
1                                 Albania
2                                 Algeria
3                                 Andorra
4                                  Angola
                      ...                
189    Venezuela (Bolivarian Republic of)
190                              Viet Nam
191                                 Yemen
192                                Zambia
193                              Zimbabwe
Name: Country, Length: 194, dtype: object

Method-2

In [272]:
df_who['Country']
Out[272]:
0                             Afghanistan
1                                 Albania
2                                 Algeria
3                                 Andorra
4                                  Angola
                      ...                
189    Venezuela (Bolivarian Republic of)
190                              Viet Nam
191                                 Yemen
192                                Zambia
193                              Zimbabwe
Name: Country, Length: 194, dtype: object

Subset Specific Column(Using Variable Name)-Mutipule Variable

In [273]:
df_who[['Country', 'Region', 'Population']]
Out[273]:
Country Region Population
0 Afghanistan Eastern Mediterranean 29825
1 Albania Europe 3162
2 Algeria Africa 38482
3 Andorra Europe 78
4 Angola Africa 20821
5 Antigua and Barbuda Americas 89
6 Argentina Americas 41087
7 Armenia Europe 2969
8 Australia Western Pacific 23050
9 Austria Europe 8464
10 Azerbaijan Europe 9309
11 Bahamas Americas 372
12 Bahrain Eastern Mediterranean 1318
13 Bangladesh South-East Asia 155000
14 Barbados Americas 283
15 Belarus Europe 9405
16 Belgium Europe 11060
17 Belize Americas 324
18 Benin Africa 10051
19 Bhutan South-East Asia 742
20 Bolivia (Plurinational State of) Americas 10496
21 Bosnia and Herzegovina Europe 3834
22 Botswana Africa 2004
23 Brazil Americas 199000
24 Brunei Darussalam Western Pacific 412
25 Bulgaria Europe 7278
26 Burkina Faso Africa 16460
27 Burundi Africa 9850
28 Cambodia Western Pacific 14865
29 Cameroon Africa 21700
... ... ... ...
164 Suriname Americas 535
165 Swaziland Africa 1231
166 Sweden Europe 9511
167 Switzerland Europe 7997
168 Syrian Arab Republic Eastern Mediterranean 21890
169 Tajikistan Europe 8009
170 Thailand South-East Asia 66785
171 The former Yugoslav Republic of Macedonia Europe 2106
172 Timor-Leste South-East Asia 1114
173 Togo Africa 6643
174 Tonga Western Pacific 105
175 Trinidad and Tobago Americas 1337
176 Tunisia Eastern Mediterranean 10875
177 Turkey Europe 73997
178 Turkmenistan Europe 5173
179 Tuvalu Western Pacific 10
180 Uganda Africa 36346
181 Ukraine Europe 45530
182 United Arab Emirates Eastern Mediterranean 9206
183 United Kingdom Europe 62783
184 United Republic of Tanzania Africa 47783
185 United States of America Americas 318000
186 Uruguay Americas 3395
187 Uzbekistan Europe 28541
188 Vanuatu Western Pacific 247
189 Venezuela (Bolivarian Republic of) Americas 29955
190 Viet Nam Western Pacific 90796
191 Yemen Eastern Mediterranean 23852
192 Zambia Africa 14075
193 Zimbabwe Africa 13724

194 rows × 3 columns

subset Specific Rows and Columns(Using Variable Name & loc)

In [274]:
df_who.loc[1:4,['Country', 'Region', 'Population']]
Out[274]:
Country Region Population
1 Albania Europe 3162
2 Algeria Africa 38482
3 Andorra Europe 78
4 Angola Africa 20821
In [275]:
df_who.loc[[2,4,50,52],['Country', 'Region', 'Population']]
Out[275]:
Country Region Population
2 Algeria Africa 38482
4 Angola Africa 20821
50 Dominica Americas 72
52 Ecuador Americas 15492

Example

In [276]:
df_who.loc[df_who.Population>360000,['Country']]
Out[276]:
Country
35 China
77 India

Example

In [277]:
df_who.loc[df_who.Population>360000,['Country','Population']]
Out[277]:
Country Population
35 China 1390000
77 India 1240000

Reverse row order

In [278]:
df_who.loc[::-1]
Out[278]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
193 Zimbabwe Africa 13724 40.24 5.68 3.64 54 89.8 72.13 92.2 NaN NaN NaN
192 Zambia Africa 14075 46.73 3.95 5.77 55 88.5 60.59 71.2 1490.0 91.4 93.9
191 Yemen Eastern Mediterranean 23852 40.72 4.54 4.35 64 60.0 47.05 63.9 2170.0 85.5 70.5
190 Viet Nam Western Pacific 90796 22.87 9.32 1.79 75 23.0 143.39 93.2 3250.0 NaN NaN
189 Venezuela (Bolivarian Republic of) Americas 29955 28.84 9.17 2.44 75 15.3 97.78 NaN 12430.0 94.7 95.1
188 Vanuatu Western Pacific 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN
187 Uzbekistan Europe 28541 28.90 6.38 2.38 68 39.6 91.65 99.4 3420.0 93.3 91.0
186 Uruguay Americas 3395 22.05 18.59 2.07 77 7.2 140.75 98.1 14640.0 NaN NaN
185 United States of America Americas 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1
184 United Republic of Tanzania Africa 47783 44.85 4.89 5.36 59 54.0 55.53 73.2 1500.0 NaN NaN
183 United Kingdom Europe 62783 17.54 23.06 1.90 80 4.8 130.75 NaN 36010.0 99.8 99.6
182 United Arab Emirates Eastern Mediterranean 9206 14.41 0.81 1.84 76 8.4 148.62 NaN 47890.0 NaN NaN
181 Ukraine Europe 45530 14.18 20.76 1.45 71 10.7 122.98 99.7 7040.0 90.8 91.5
180 Uganda Africa 36346 48.54 3.72 6.06 56 68.9 48.38 73.2 1310.0 89.7 92.3
179 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
178 Turkmenistan Europe 5173 28.65 6.30 2.38 63 52.8 68.77 99.6 8690.0 NaN NaN
177 Turkey Europe 73997 26.00 10.56 2.08 76 14.2 88.70 NaN 16940.0 99.5 98.3
176 Tunisia Eastern Mediterranean 10875 23.22 10.49 2.04 76 16.1 116.93 NaN 9030.0 NaN NaN
175 Trinidad and Tobago Americas 1337 20.73 13.18 1.80 71 20.7 135.64 98.8 NaN 97.7 97.0
174 Tonga Western Pacific 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
173 Togo Africa 6643 41.89 4.44 4.75 56 95.5 50.45 NaN 1040.0 NaN NaN
172 Timor-Leste South-East Asia 1114 46.33 5.16 6.11 64 56.7 53.23 58.3 NaN 86.2 85.6
171 The former Yugoslav Republic of Macedonia Europe 2106 16.89 17.56 1.44 75 7.4 107.24 97.3 11090.0 97.3 99.2
170 Thailand South-East Asia 66785 18.47 13.96 1.43 74 13.2 111.63 NaN 8360.0 NaN NaN
169 Tajikistan Europe 8009 35.75 4.80 3.81 68 58.3 90.64 99.7 2300.0 99.5 96.0
168 Syrian Arab Republic Eastern Mediterranean 21890 35.35 6.09 3.04 75 15.1 63.17 83.4 NaN NaN NaN
167 Switzerland Europe 7997 14.79 23.25 1.51 83 4.3 131.43 NaN 52570.0 98.9 99.5
166 Sweden Europe 9511 16.71 25.32 1.93 82 2.9 118.57 NaN 42200.0 99.7 99.0
165 Swaziland Africa 1231 38.05 5.34 3.48 50 79.7 63.70 87.4 5930.0 NaN NaN
164 Suriname Americas 535 27.83 9.55 2.32 72 20.8 178.88 94.7 NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ...
29 Cameroon Africa 21700 43.08 4.89 4.94 53 94.9 52.35 NaN 2330.0 99.6 87.4
28 Cambodia Western Pacific 14865 31.23 7.67 2.93 65 39.7 96.17 NaN 2230.0 96.4 95.4
27 Burundi Africa 9850 44.20 3.87 6.21 53 104.3 22.33 67.2 610.0 NaN NaN
26 Burkina Faso Africa 16460 45.66 3.88 5.78 56 102.4 45.27 NaN 1300.0 60.7 55.9
25 Bulgaria Europe 7278 13.53 26.11 1.51 74 12.1 140.68 NaN 14160.0 99.3 99.7
24 Brunei Darussalam Western Pacific 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
23 Brazil Americas 199000 24.56 10.81 1.82 74 14.4 124.26 NaN 11420.0 NaN NaN
22 Botswana Africa 2004 33.75 5.63 2.71 66 53.3 142.82 84.5 14550.0 NaN NaN
21 Bosnia and Herzegovina Europe 3834 16.35 20.52 1.26 76 6.7 84.52 97.9 9190.0 86.5 88.4
20 Bolivia (Plurinational State of) Americas 10496 35.23 7.28 3.31 67 41.4 82.82 NaN 4890.0 91.2 91.5
19 Bhutan South-East Asia 742 28.53 6.90 2.32 67 44.6 65.58 NaN 5570.0 88.3 91.5
18 Benin Africa 10051 42.95 4.54 5.01 57 89.5 85.33 42.4 1620.0 NaN NaN
17 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
16 Belgium Europe 11060 16.88 23.81 1.85 80 4.2 116.61 NaN 39190.0 98.9 99.2
15 Belarus Europe 9405 15.10 19.31 1.47 71 5.2 111.88 NaN 14460.0 NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
13 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
12 Bahrain Eastern Mediterranean 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
10 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
9 Austria Europe 8464 14.51 23.52 1.44 81 4.0 154.78 NaN 42050.0 NaN NaN
8 Australia Western Pacific 23050 18.95 19.46 1.89 82 4.9 108.34 NaN 38110.0 96.9 97.5
7 Armenia Europe 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN
6 Argentina Americas 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN

194 rows × 13 columns

  • What if you also wanted to reset the index so that it starts at zero?
  • You would use the reset_index() method and tell it to drop the old index entirely:
  • More Information
  • For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc.

reset_index()

In [279]:
df_who.loc[::-1].reset_index(drop=True)
Out[279]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Zimbabwe Africa 13724 40.24 5.68 3.64 54 89.8 72.13 92.2 NaN NaN NaN
1 Zambia Africa 14075 46.73 3.95 5.77 55 88.5 60.59 71.2 1490.0 91.4 93.9
2 Yemen Eastern Mediterranean 23852 40.72 4.54 4.35 64 60.0 47.05 63.9 2170.0 85.5 70.5
3 Viet Nam Western Pacific 90796 22.87 9.32 1.79 75 23.0 143.39 93.2 3250.0 NaN NaN
4 Venezuela (Bolivarian Republic of) Americas 29955 28.84 9.17 2.44 75 15.3 97.78 NaN 12430.0 94.7 95.1
5 Vanuatu Western Pacific 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN
6 Uzbekistan Europe 28541 28.90 6.38 2.38 68 39.6 91.65 99.4 3420.0 93.3 91.0
7 Uruguay Americas 3395 22.05 18.59 2.07 77 7.2 140.75 98.1 14640.0 NaN NaN
8 United States of America Americas 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1
9 United Republic of Tanzania Africa 47783 44.85 4.89 5.36 59 54.0 55.53 73.2 1500.0 NaN NaN
10 United Kingdom Europe 62783 17.54 23.06 1.90 80 4.8 130.75 NaN 36010.0 99.8 99.6
11 United Arab Emirates Eastern Mediterranean 9206 14.41 0.81 1.84 76 8.4 148.62 NaN 47890.0 NaN NaN
12 Ukraine Europe 45530 14.18 20.76 1.45 71 10.7 122.98 99.7 7040.0 90.8 91.5
13 Uganda Africa 36346 48.54 3.72 6.06 56 68.9 48.38 73.2 1310.0 89.7 92.3
14 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
15 Turkmenistan Europe 5173 28.65 6.30 2.38 63 52.8 68.77 99.6 8690.0 NaN NaN
16 Turkey Europe 73997 26.00 10.56 2.08 76 14.2 88.70 NaN 16940.0 99.5 98.3
17 Tunisia Eastern Mediterranean 10875 23.22 10.49 2.04 76 16.1 116.93 NaN 9030.0 NaN NaN
18 Trinidad and Tobago Americas 1337 20.73 13.18 1.80 71 20.7 135.64 98.8 NaN 97.7 97.0
19 Tonga Western Pacific 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
20 Togo Africa 6643 41.89 4.44 4.75 56 95.5 50.45 NaN 1040.0 NaN NaN
21 Timor-Leste South-East Asia 1114 46.33 5.16 6.11 64 56.7 53.23 58.3 NaN 86.2 85.6
22 The former Yugoslav Republic of Macedonia Europe 2106 16.89 17.56 1.44 75 7.4 107.24 97.3 11090.0 97.3 99.2
23 Thailand South-East Asia 66785 18.47 13.96 1.43 74 13.2 111.63 NaN 8360.0 NaN NaN
24 Tajikistan Europe 8009 35.75 4.80 3.81 68 58.3 90.64 99.7 2300.0 99.5 96.0
25 Syrian Arab Republic Eastern Mediterranean 21890 35.35 6.09 3.04 75 15.1 63.17 83.4 NaN NaN NaN
26 Switzerland Europe 7997 14.79 23.25 1.51 83 4.3 131.43 NaN 52570.0 98.9 99.5
27 Sweden Europe 9511 16.71 25.32 1.93 82 2.9 118.57 NaN 42200.0 99.7 99.0
28 Swaziland Africa 1231 38.05 5.34 3.48 50 79.7 63.70 87.4 5930.0 NaN NaN
29 Suriname Americas 535 27.83 9.55 2.32 72 20.8 178.88 94.7 NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ...
164 Cameroon Africa 21700 43.08 4.89 4.94 53 94.9 52.35 NaN 2330.0 99.6 87.4
165 Cambodia Western Pacific 14865 31.23 7.67 2.93 65 39.7 96.17 NaN 2230.0 96.4 95.4
166 Burundi Africa 9850 44.20 3.87 6.21 53 104.3 22.33 67.2 610.0 NaN NaN
167 Burkina Faso Africa 16460 45.66 3.88 5.78 56 102.4 45.27 NaN 1300.0 60.7 55.9
168 Bulgaria Europe 7278 13.53 26.11 1.51 74 12.1 140.68 NaN 14160.0 99.3 99.7
169 Brunei Darussalam Western Pacific 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
170 Brazil Americas 199000 24.56 10.81 1.82 74 14.4 124.26 NaN 11420.0 NaN NaN
171 Botswana Africa 2004 33.75 5.63 2.71 66 53.3 142.82 84.5 14550.0 NaN NaN
172 Bosnia and Herzegovina Europe 3834 16.35 20.52 1.26 76 6.7 84.52 97.9 9190.0 86.5 88.4
173 Bolivia (Plurinational State of) Americas 10496 35.23 7.28 3.31 67 41.4 82.82 NaN 4890.0 91.2 91.5
174 Bhutan South-East Asia 742 28.53 6.90 2.32 67 44.6 65.58 NaN 5570.0 88.3 91.5
175 Benin Africa 10051 42.95 4.54 5.01 57 89.5 85.33 42.4 1620.0 NaN NaN
176 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
177 Belgium Europe 11060 16.88 23.81 1.85 80 4.2 116.61 NaN 39190.0 98.9 99.2
178 Belarus Europe 9405 15.10 19.31 1.47 71 5.2 111.88 NaN 14460.0 NaN NaN
179 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
180 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
181 Bahrain Eastern Mediterranean 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
182 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
183 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
184 Austria Europe 8464 14.51 23.52 1.44 81 4.0 154.78 NaN 42050.0 NaN NaN
185 Australia Western Pacific 23050 18.95 19.46 1.89 82 4.9 108.34 NaN 38110.0 96.9 97.5
186 Armenia Europe 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN
187 Argentina Americas 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN
188 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
189 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
190 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
191 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
192 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
193 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN

194 rows × 13 columns

  • As you can see, the rows are in reverse order but the index has been reset to the default integer index.

Reverse column order

In [280]:
df_who.loc[:, ::-1]
Out[280]:
PrimarySchoolEnrollmentFemale PrimarySchoolEnrollmentMale GNI LiteracyRate CellularSubscribers ChildMortality LifeExpectancy FertilityRate Over60 Under15 Population Region Country
0 NaN NaN 1140.0 NaN 54.26 98.5 60 5.40 3.82 47.42 29825 Eastern Mediterranean Afghanistan
1 NaN NaN 8820.0 NaN 96.39 16.7 74 1.75 14.93 21.33 3162 Europe Albania
2 96.4 98.2 8310.0 NaN 98.99 20.0 73 2.83 7.17 27.42 38482 Africa Algeria
3 79.4 78.4 NaN NaN 75.49 3.2 82 NaN 22.86 15.20 78 Europe Andorra
4 78.2 93.1 5230.0 70.1 48.38 163.5 51 6.10 3.84 47.58 20821 Africa Angola
5 84.5 91.1 17900.0 99.0 196.41 9.9 75 2.12 12.35 25.96 89 Americas Antigua and Barbuda
6 NaN NaN 17130.0 97.8 134.92 14.2 76 2.20 14.97 24.42 41087 Americas Argentina
7 NaN NaN 6100.0 99.6 103.57 16.4 71 1.74 14.06 20.34 2969 Europe Armenia
8 97.5 96.9 38110.0 NaN 108.34 4.9 82 1.89 19.46 18.95 23050 Western Pacific Australia
9 NaN NaN 42050.0 NaN 154.78 4.0 81 1.44 23.52 14.51 8464 Europe Austria
10 84.1 85.3 8960.0 NaN 108.75 35.2 71 1.96 8.24 22.25 9309 Europe Azerbaijan
11 NaN NaN NaN NaN 86.06 16.9 75 1.90 11.24 21.62 372 Americas Bahamas
12 NaN NaN NaN 91.9 127.96 9.6 79 2.12 3.38 20.16 1318 Eastern Mediterranean Bahrain
13 NaN NaN 1940.0 56.8 56.06 40.9 70 2.24 6.89 30.57 155000 South-East Asia Bangladesh
14 NaN NaN NaN NaN 127.01 18.4 78 1.84 15.78 18.99 283 Americas Barbados
15 NaN NaN 14460.0 NaN 111.88 5.2 71 1.47 19.31 15.10 9405 Europe Belarus
16 99.2 98.9 39190.0 NaN 116.61 4.2 80 1.85 23.81 16.88 11060 Europe Belgium
17 NaN NaN 6090.0 NaN 69.96 18.3 74 2.76 5.74 34.40 324 Americas Belize
18 NaN NaN 1620.0 42.4 85.33 89.5 57 5.01 4.54 42.95 10051 Africa Benin
19 91.5 88.3 5570.0 NaN 65.58 44.6 67 2.32 6.90 28.53 742 South-East Asia Bhutan
20 91.5 91.2 4890.0 NaN 82.82 41.4 67 3.31 7.28 35.23 10496 Americas Bolivia (Plurinational State of)
21 88.4 86.5 9190.0 97.9 84.52 6.7 76 1.26 20.52 16.35 3834 Europe Bosnia and Herzegovina
22 NaN NaN 14550.0 84.5 142.82 53.3 66 2.71 5.63 33.75 2004 Africa Botswana
23 NaN NaN 11420.0 NaN 124.26 14.4 74 1.82 10.81 24.56 199000 Americas Brazil
24 NaN NaN NaN 95.2 109.17 8.0 77 2.03 7.03 25.75 412 Western Pacific Brunei Darussalam
25 99.7 99.3 14160.0 NaN 140.68 12.1 74 1.51 26.11 13.53 7278 Europe Bulgaria
26 55.9 60.7 1300.0 NaN 45.27 102.4 56 5.78 3.88 45.66 16460 Africa Burkina Faso
27 NaN NaN 610.0 67.2 22.33 104.3 53 6.21 3.87 44.20 9850 Africa Burundi
28 95.4 96.4 2230.0 NaN 96.17 39.7 65 2.93 7.67 31.23 14865 Western Pacific Cambodia
29 87.4 99.6 2330.0 NaN 52.35 94.9 53 4.94 4.89 43.08 21700 Africa Cameroon
... ... ... ... ... ... ... ... ... ... ... ... ... ...
164 NaN NaN NaN 94.7 178.88 20.8 72 2.32 9.55 27.83 535 Americas Suriname
165 NaN NaN 5930.0 87.4 63.70 79.7 50 3.48 5.34 38.05 1231 Africa Swaziland
166 99.0 99.7 42200.0 NaN 118.57 2.9 82 1.93 25.32 16.71 9511 Europe Sweden
167 99.5 98.9 52570.0 NaN 131.43 4.3 83 1.51 23.25 14.79 7997 Europe Switzerland
168 NaN NaN NaN 83.4 63.17 15.1 75 3.04 6.09 35.35 21890 Eastern Mediterranean Syrian Arab Republic
169 96.0 99.5 2300.0 99.7 90.64 58.3 68 3.81 4.80 35.75 8009 Europe Tajikistan
170 NaN NaN 8360.0 NaN 111.63 13.2 74 1.43 13.96 18.47 66785 South-East Asia Thailand
171 99.2 97.3 11090.0 97.3 107.24 7.4 75 1.44 17.56 16.89 2106 Europe The former Yugoslav Republic of Macedonia
172 85.6 86.2 NaN 58.3 53.23 56.7 64 6.11 5.16 46.33 1114 South-East Asia Timor-Leste
173 NaN NaN 1040.0 NaN 50.45 95.5 56 4.75 4.44 41.89 6643 Africa Togo
174 NaN NaN 5000.0 NaN 52.63 12.8 72 3.86 7.96 37.33 105 Western Pacific Tonga
175 97.0 97.7 NaN 98.8 135.64 20.7 71 1.80 13.18 20.73 1337 Americas Trinidad and Tobago
176 NaN NaN 9030.0 NaN 116.93 16.1 76 2.04 10.49 23.22 10875 Eastern Mediterranean Tunisia
177 98.3 99.5 16940.0 NaN 88.70 14.2 76 2.08 10.56 26.00 73997 Europe Turkey
178 NaN NaN 8690.0 99.6 68.77 52.8 63 2.38 6.30 28.65 5173 Europe Turkmenistan
179 NaN NaN NaN NaN 21.63 29.7 64 NaN 9.07 30.61 10 Western Pacific Tuvalu
180 92.3 89.7 1310.0 73.2 48.38 68.9 56 6.06 3.72 48.54 36346 Africa Uganda
181 91.5 90.8 7040.0 99.7 122.98 10.7 71 1.45 20.76 14.18 45530 Europe Ukraine
182 NaN NaN 47890.0 NaN 148.62 8.4 76 1.84 0.81 14.41 9206 Eastern Mediterranean United Arab Emirates
183 99.6 99.8 36010.0 NaN 130.75 4.8 80 1.90 23.06 17.54 62783 Europe United Kingdom
184 NaN NaN 1500.0 73.2 55.53 54.0 59 5.36 4.89 44.85 47783 Africa United Republic of Tanzania
185 96.1 95.4 48820.0 NaN 92.72 7.1 79 2.00 19.31 19.63 318000 Americas United States of America
186 NaN NaN 14640.0 98.1 140.75 7.2 77 2.07 18.59 22.05 3395 Americas Uruguay
187 91.0 93.3 3420.0 99.4 91.65 39.6 68 2.38 6.38 28.90 28541 Europe Uzbekistan
188 NaN NaN 4330.0 82.6 55.76 17.9 72 3.46 6.02 37.37 247 Western Pacific Vanuatu
189 95.1 94.7 12430.0 NaN 97.78 15.3 75 2.44 9.17 28.84 29955 Americas Venezuela (Bolivarian Republic of)
190 NaN NaN 3250.0 93.2 143.39 23.0 75 1.79 9.32 22.87 90796 Western Pacific Viet Nam
191 70.5 85.5 2170.0 63.9 47.05 60.0 64 4.35 4.54 40.72 23852 Eastern Mediterranean Yemen
192 93.9 91.4 1490.0 71.2 60.59 88.5 55 5.77 3.95 46.73 14075 Africa Zambia
193 NaN NaN NaN 92.2 72.13 89.8 54 3.64 5.68 40.24 13724 Africa Zimbabwe

194 rows × 13 columns

idxmax

  • Return index of first occurrence of maximum over requested axis.
  • NA/null values are excluded.
In [281]:
from IPython.display import YouTubeVideo
YouTubeVideo('egdfGJaBIh0',width=900, height=500)
Out[281]:
In [282]:
df = pd.DataFrame({"A":[4, 5, 2, 6],  
                   "B":[11, 2, 115, 8], 
                   "C":[1, 8, 66, 4]})
df
Out[282]:
A B C
0 4 11 1
1 5 2 8
2 2 115 66
3 6 8 4
In [283]:
#df.idxmax(axis=0)
df.idxmax()
#0='rows'
Out[283]:
A    3
B    2
C    2
dtype: int64
In [284]:
df.loc[df.idxmax(axis=0)]
Out[284]:
A B C
3 6 8 4
2 2 115 66
2 2 115 66
In [285]:
df.idxmax(axis=1)
#1='column'
Out[285]:
0    B
1    C
2    B
3    B
dtype: object

idxmin

  • Return index of first occurrence of minimum over requested axis.
  • NA/null values are excluded.
In [286]:
df = pd.DataFrame({"A":[4, 5, 2, 6],  
                   "B":[11, 2, 115, 8], 
                   "C":[1, 8, 66, 4]})
df
Out[286]:
A B C
0 4 11 1
1 5 2 8
2 2 115 66
3 6 8 4
In [287]:
#df.idxmin(axis=0)
df.idxmin()
#0='rows'
Out[287]:
A    2
B    1
C    0
dtype: int64
In [288]:
df.idxmin(axis=1)
#1='colmun'
Out[288]:
0    C
1    B
2    A
3    C
dtype: object

first_valid_index

  • first_valid_index: Return index for first non-NA/null value.
In [289]:
df = pd.DataFrame({
                   "A1":[np.NaN,np.NaN,np.NaN,5,6,np.NaN,np.NaN], 
                   })
df
Out[289]:
A1
0 NaN
1 NaN
2 NaN
3 5.0
4 6.0
5 NaN
6 NaN
In [290]:
df.first_valid_index()
Out[290]:
3

last_valid_index

  • last_valid_index: Return index for last non-NA/null value.
In [291]:
df = pd.DataFrame({
                   "A1":[np.NaN,np.NaN,np.NaN,5,6,np.NaN,np.NaN], 
                   })
df
Out[291]:
A1
0 NaN
1 NaN
2 NaN
3 5.0
4 6.0
5 NaN
6 NaN
In [292]:
df.last_valid_index()
Out[292]:
4

transpose

In [293]:
df=pd.read_csv('rep.csv')
df
Out[293]:
Name Degree Rep
0 Umer MS DS 2
1 Ali BAA 3
2 Ahmed MS CS 1
3 Bilal MS EE 4

Method-1

In [294]:
df.T
Out[294]:
0 1 2 3
Name Umer Ali Ahmed Bilal
Degree MS DS BAA MS CS MS EE
Rep 2 3 1 4

Method-2

In [295]:
df.transpose()
Out[295]:
0 1 2 3
Name Umer Ali Ahmed Bilal
Degree MS DS BAA MS CS MS EE
Rep 2 3 1 4

xs

  • Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.
  • Defaults to cross-section on the rows (axis=0).
  • More Information
In [296]:
df=pd.read_csv('rep.csv')
df.xs(1)
Out[296]:
Name      Ali 
Degree     BAA
Rep          3
Name: 1, dtype: object
In [297]:
df.xs('Name',axis=1)
Out[297]:
0     Umer
1     Ali 
2    Ahmed
3    Bilal
Name: Name, dtype: object

slice_shift

In [298]:
df = pd.DataFrame({"A":[1, 2, 3, 4, 5],  
                   "B":[10, 20, 30, 40, 50],  
                   "C":[11, 22, 33, 44, 55], 
                   "D":[12, 24, 51, 36, 2]})
df
Out[298]:
A B C D
0 1 10 11 12
1 2 20 22 24
2 3 30 33 51
3 4 40 44 36
4 5 50 55 2
In [299]:
df.slice_shift(2)
Out[299]:
A B C D
2 1 10 11 12
3 2 20 22 24
4 3 30 33 51
In [300]:
df.slice_shift(2, axis = 0)
Out[300]:
A B C D
2 1 10 11 12
3 2 20 22 24
4 3 30 33 51
In [301]:
df.slice_shift(2, axis = 1)
Out[301]:
C D
0 1 10
1 2 20
2 3 30
3 4 40
4 5 50
In [302]:
df.slice_shift(-2, axis = 1)
Out[302]:
A B
0 11 12
1 22 24
2 33 51
3 44 36
4 55 2

truncate

  • Truncate a Series or DataFrame before and after some index value.
  • More Information
In [303]:
df = pd.DataFrame({'A': ['a', 'b', 'c', 'd', 'e'],
                    'B': ['f', 'g', 'h', 'i', 'j'],
                    'C': ['k', 'l', 'm', 'n', 'o']})
df
Out[303]:
A B C
0 a f k
1 b g l
2 c h m
3 d i n
4 e j o
In [304]:
df.truncate(before=2, after=3)
Out[304]:
A B C
2 c h m
3 d i n
In [305]:
df.truncate(before="A", after="B", axis="columns")
Out[305]:
A B
0 a f
1 b g
2 c h
3 d i
4 e j
In [306]:
df['A'].truncate(before=2, after=4)
Out[306]:
2    c
3    d
4    e
Name: A, dtype: object

11.18 Conditional Filtering in Pandas

In [307]:
from IPython.display import YouTubeVideo
YouTubeVideo('2AFGPdNn4FM',width=900, height=500)
Out[307]:
In [308]:
import pandas as pd
df_who= pd.read_csv('WHO_csv.csv')

Example-1(a)

  • Filter Population geater than 360000.

Method-1

In [309]:
df_who[df_who.Population>360000]
Out[309]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
35 China Western Pacific 1390000 17.95 13.42 1.66 76 14.0 73.19 94.3 8390.0 NaN NaN
77 India South-East Asia 1240000 29.43 8.10 2.53 65 56.3 72.00 NaN 3590.0 NaN NaN

Method-2

In [310]:
df_who[df_who['Population'].gt(360000)]
Out[310]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
35 China Western Pacific 1390000 17.95 13.42 1.66 76 14.0 73.19 94.3 8390.0 NaN NaN
77 India South-East Asia 1240000 29.43 8.10 2.53 65 56.3 72.00 NaN 3590.0 NaN NaN

Example-1(b)

  • Filter Population less than 360000.

Method-1

In [311]:
df_who[df_who.Population<360000]
Out[311]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
6 Argentina Americas 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN
7 Armenia Europe 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN
8 Australia Western Pacific 23050 18.95 19.46 1.89 82 4.9 108.34 NaN 38110.0 96.9 97.5
9 Austria Europe 8464 14.51 23.52 1.44 81 4.0 154.78 NaN 42050.0 NaN NaN
10 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
12 Bahrain Eastern Mediterranean 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
13 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
15 Belarus Europe 9405 15.10 19.31 1.47 71 5.2 111.88 NaN 14460.0 NaN NaN
16 Belgium Europe 11060 16.88 23.81 1.85 80 4.2 116.61 NaN 39190.0 98.9 99.2
17 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
18 Benin Africa 10051 42.95 4.54 5.01 57 89.5 85.33 42.4 1620.0 NaN NaN
19 Bhutan South-East Asia 742 28.53 6.90 2.32 67 44.6 65.58 NaN 5570.0 88.3 91.5
20 Bolivia (Plurinational State of) Americas 10496 35.23 7.28 3.31 67 41.4 82.82 NaN 4890.0 91.2 91.5
21 Bosnia and Herzegovina Europe 3834 16.35 20.52 1.26 76 6.7 84.52 97.9 9190.0 86.5 88.4
22 Botswana Africa 2004 33.75 5.63 2.71 66 53.3 142.82 84.5 14550.0 NaN NaN
23 Brazil Americas 199000 24.56 10.81 1.82 74 14.4 124.26 NaN 11420.0 NaN NaN
24 Brunei Darussalam Western Pacific 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
25 Bulgaria Europe 7278 13.53 26.11 1.51 74 12.1 140.68 NaN 14160.0 99.3 99.7
26 Burkina Faso Africa 16460 45.66 3.88 5.78 56 102.4 45.27 NaN 1300.0 60.7 55.9
27 Burundi Africa 9850 44.20 3.87 6.21 53 104.3 22.33 67.2 610.0 NaN NaN
28 Cambodia Western Pacific 14865 31.23 7.67 2.93 65 39.7 96.17 NaN 2230.0 96.4 95.4
29 Cameroon Africa 21700 43.08 4.89 4.94 53 94.9 52.35 NaN 2330.0 99.6 87.4
... ... ... ... ... ... ... ... ... ... ... ... ... ...
164 Suriname Americas 535 27.83 9.55 2.32 72 20.8 178.88 94.7 NaN NaN NaN
165 Swaziland Africa 1231 38.05 5.34 3.48 50 79.7 63.70 87.4 5930.0 NaN NaN
166 Sweden Europe 9511 16.71 25.32 1.93 82 2.9 118.57 NaN 42200.0 99.7 99.0
167 Switzerland Europe 7997 14.79 23.25 1.51 83 4.3 131.43 NaN 52570.0 98.9 99.5
168 Syrian Arab Republic Eastern Mediterranean 21890 35.35 6.09 3.04 75 15.1 63.17 83.4 NaN NaN NaN
169 Tajikistan Europe 8009 35.75 4.80 3.81 68 58.3 90.64 99.7 2300.0 99.5 96.0
170 Thailand South-East Asia 66785 18.47 13.96 1.43 74 13.2 111.63 NaN 8360.0 NaN NaN
171 The former Yugoslav Republic of Macedonia Europe 2106 16.89 17.56 1.44 75 7.4 107.24 97.3 11090.0 97.3 99.2
172 Timor-Leste South-East Asia 1114 46.33 5.16 6.11 64 56.7 53.23 58.3 NaN 86.2 85.6
173 Togo Africa 6643 41.89 4.44 4.75 56 95.5 50.45 NaN 1040.0 NaN NaN
174 Tonga Western Pacific 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
175 Trinidad and Tobago Americas 1337 20.73 13.18 1.80 71 20.7 135.64 98.8 NaN 97.7 97.0
176 Tunisia Eastern Mediterranean 10875 23.22 10.49 2.04 76 16.1 116.93 NaN 9030.0 NaN NaN
177 Turkey Europe 73997 26.00 10.56 2.08 76 14.2 88.70 NaN 16940.0 99.5 98.3
178 Turkmenistan Europe 5173 28.65 6.30 2.38 63 52.8 68.77 99.6 8690.0 NaN NaN
179 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
180 Uganda Africa 36346 48.54 3.72 6.06 56 68.9 48.38 73.2 1310.0 89.7 92.3
181 Ukraine Europe 45530 14.18 20.76 1.45 71 10.7 122.98 99.7 7040.0 90.8 91.5
182 United Arab Emirates Eastern Mediterranean 9206 14.41 0.81 1.84 76 8.4 148.62 NaN 47890.0 NaN NaN
183 United Kingdom Europe 62783 17.54 23.06 1.90 80 4.8 130.75 NaN 36010.0 99.8 99.6
184 United Republic of Tanzania Africa 47783 44.85 4.89 5.36 59 54.0 55.53 73.2 1500.0 NaN NaN
185 United States of America Americas 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1
186 Uruguay Americas 3395 22.05 18.59 2.07 77 7.2 140.75 98.1 14640.0 NaN NaN
187 Uzbekistan Europe 28541 28.90 6.38 2.38 68 39.6 91.65 99.4 3420.0 93.3 91.0
188 Vanuatu Western Pacific 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN
189 Venezuela (Bolivarian Republic of) Americas 29955 28.84 9.17 2.44 75 15.3 97.78 NaN 12430.0 94.7 95.1
190 Viet Nam Western Pacific 90796 22.87 9.32 1.79 75 23.0 143.39 93.2 3250.0 NaN NaN
191 Yemen Eastern Mediterranean 23852 40.72 4.54 4.35 64 60.0 47.05 63.9 2170.0 85.5 70.5
192 Zambia Africa 14075 46.73 3.95 5.77 55 88.5 60.59 71.2 1490.0 91.4 93.9
193 Zimbabwe Africa 13724 40.24 5.68 3.64 54 89.8 72.13 92.2 NaN NaN NaN

192 rows × 13 columns

Method2

In [312]:
df_who[df_who['Population'].lt(360000)]
Out[312]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
6 Argentina Americas 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN
7 Armenia Europe 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN
8 Australia Western Pacific 23050 18.95 19.46 1.89 82 4.9 108.34 NaN 38110.0 96.9 97.5
9 Austria Europe 8464 14.51 23.52 1.44 81 4.0 154.78 NaN 42050.0 NaN NaN
10 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
12 Bahrain Eastern Mediterranean 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
13 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
15 Belarus Europe 9405 15.10 19.31 1.47 71 5.2 111.88 NaN 14460.0 NaN NaN
16 Belgium Europe 11060 16.88 23.81 1.85 80 4.2 116.61 NaN 39190.0 98.9 99.2
17 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
18 Benin Africa 10051 42.95 4.54 5.01 57 89.5 85.33 42.4 1620.0 NaN NaN
19 Bhutan South-East Asia 742 28.53 6.90 2.32 67 44.6 65.58 NaN 5570.0 88.3 91.5
20 Bolivia (Plurinational State of) Americas 10496 35.23 7.28 3.31 67 41.4 82.82 NaN 4890.0 91.2 91.5
21 Bosnia and Herzegovina Europe 3834 16.35 20.52 1.26 76 6.7 84.52 97.9 9190.0 86.5 88.4
22 Botswana Africa 2004 33.75 5.63 2.71 66 53.3 142.82 84.5 14550.0 NaN NaN
23 Brazil Americas 199000 24.56 10.81 1.82 74 14.4 124.26 NaN 11420.0 NaN NaN
24 Brunei Darussalam Western Pacific 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
25 Bulgaria Europe 7278 13.53 26.11 1.51 74 12.1 140.68 NaN 14160.0 99.3 99.7
26 Burkina Faso Africa 16460 45.66 3.88 5.78 56 102.4 45.27 NaN 1300.0 60.7 55.9
27 Burundi Africa 9850 44.20 3.87 6.21 53 104.3 22.33 67.2 610.0 NaN NaN
28 Cambodia Western Pacific 14865 31.23 7.67 2.93 65 39.7 96.17 NaN 2230.0 96.4 95.4
29 Cameroon Africa 21700 43.08 4.89 4.94 53 94.9 52.35 NaN 2330.0 99.6 87.4
... ... ... ... ... ... ... ... ... ... ... ... ... ...
164 Suriname Americas 535 27.83 9.55 2.32 72 20.8 178.88 94.7 NaN NaN NaN
165 Swaziland Africa 1231 38.05 5.34 3.48 50 79.7 63.70 87.4 5930.0 NaN NaN
166 Sweden Europe 9511 16.71 25.32 1.93 82 2.9 118.57 NaN 42200.0 99.7 99.0
167 Switzerland Europe 7997 14.79 23.25 1.51 83 4.3 131.43 NaN 52570.0 98.9 99.5
168 Syrian Arab Republic Eastern Mediterranean 21890 35.35 6.09 3.04 75 15.1 63.17 83.4 NaN NaN NaN
169 Tajikistan Europe 8009 35.75 4.80 3.81 68 58.3 90.64 99.7 2300.0 99.5 96.0
170 Thailand South-East Asia 66785 18.47 13.96 1.43 74 13.2 111.63 NaN 8360.0 NaN NaN
171 The former Yugoslav Republic of Macedonia Europe 2106 16.89 17.56 1.44 75 7.4 107.24 97.3 11090.0 97.3 99.2
172 Timor-Leste South-East Asia 1114 46.33 5.16 6.11 64 56.7 53.23 58.3 NaN 86.2 85.6
173 Togo Africa 6643 41.89 4.44 4.75 56 95.5 50.45 NaN 1040.0 NaN NaN
174 Tonga Western Pacific 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
175 Trinidad and Tobago Americas 1337 20.73 13.18 1.80 71 20.7 135.64 98.8 NaN 97.7 97.0
176 Tunisia Eastern Mediterranean 10875 23.22 10.49 2.04 76 16.1 116.93 NaN 9030.0 NaN NaN
177 Turkey Europe 73997 26.00 10.56 2.08 76 14.2 88.70 NaN 16940.0 99.5 98.3
178 Turkmenistan Europe 5173 28.65 6.30 2.38 63 52.8 68.77 99.6 8690.0 NaN NaN
179 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
180 Uganda Africa 36346 48.54 3.72 6.06 56 68.9 48.38 73.2 1310.0 89.7 92.3
181 Ukraine Europe 45530 14.18 20.76 1.45 71 10.7 122.98 99.7 7040.0 90.8 91.5
182 United Arab Emirates Eastern Mediterranean 9206 14.41 0.81 1.84 76 8.4 148.62 NaN 47890.0 NaN NaN
183 United Kingdom Europe 62783 17.54 23.06 1.90 80 4.8 130.75 NaN 36010.0 99.8 99.6
184 United Republic of Tanzania Africa 47783 44.85 4.89 5.36 59 54.0 55.53 73.2 1500.0 NaN NaN
185 United States of America Americas 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1
186 Uruguay Americas 3395 22.05 18.59 2.07 77 7.2 140.75 98.1 14640.0 NaN NaN
187 Uzbekistan Europe 28541 28.90 6.38 2.38 68 39.6 91.65 99.4 3420.0 93.3 91.0
188 Vanuatu Western Pacific 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN
189 Venezuela (Bolivarian Republic of) Americas 29955 28.84 9.17 2.44 75 15.3 97.78 NaN 12430.0 94.7 95.1
190 Viet Nam Western Pacific 90796 22.87 9.32 1.79 75 23.0 143.39 93.2 3250.0 NaN NaN
191 Yemen Eastern Mediterranean 23852 40.72 4.54 4.35 64 60.0 47.05 63.9 2170.0 85.5 70.5
192 Zambia Africa 14075 46.73 3.95 5.77 55 88.5 60.59 71.2 1490.0 91.4 93.9
193 Zimbabwe Africa 13724 40.24 5.68 3.64 54 89.8 72.13 92.2 NaN NaN NaN

192 rows × 13 columns

Example-1(c)

  • Filter Population equal to 28541.

Method-1

In [313]:
df_who[df_who.Population==28541]
Out[313]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
187 Uzbekistan Europe 28541 28.9 6.38 2.38 68 39.6 91.65 99.4 3420.0 93.3 91.0

Method-2

In [314]:
df_who[df_who['Population'].eq(28541)]
Out[314]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
187 Uzbekistan Europe 28541 28.9 6.38 2.38 68 39.6 91.65 99.4 3420.0 93.3 91.0

Example-1(d)

  • Filter Population not equal to 28541.

Method-1

In [315]:
df_who[df_who.Population!=28541]
Out[315]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
6 Argentina Americas 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN
7 Armenia Europe 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN
8 Australia Western Pacific 23050 18.95 19.46 1.89 82 4.9 108.34 NaN 38110.0 96.9 97.5
9 Austria Europe 8464 14.51 23.52 1.44 81 4.0 154.78 NaN 42050.0 NaN NaN
10 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
12 Bahrain Eastern Mediterranean 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
13 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
15 Belarus Europe 9405 15.10 19.31 1.47 71 5.2 111.88 NaN 14460.0 NaN NaN
16 Belgium Europe 11060 16.88 23.81 1.85 80 4.2 116.61 NaN 39190.0 98.9 99.2
17 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
18 Benin Africa 10051 42.95 4.54 5.01 57 89.5 85.33 42.4 1620.0 NaN NaN
19 Bhutan South-East Asia 742 28.53 6.90 2.32 67 44.6 65.58 NaN 5570.0 88.3 91.5
20 Bolivia (Plurinational State of) Americas 10496 35.23 7.28 3.31 67 41.4 82.82 NaN 4890.0 91.2 91.5
21 Bosnia and Herzegovina Europe 3834 16.35 20.52 1.26 76 6.7 84.52 97.9 9190.0 86.5 88.4
22 Botswana Africa 2004 33.75 5.63 2.71 66 53.3 142.82 84.5 14550.0 NaN NaN
23 Brazil Americas 199000 24.56 10.81 1.82 74 14.4 124.26 NaN 11420.0 NaN NaN
24 Brunei Darussalam Western Pacific 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
25 Bulgaria Europe 7278 13.53 26.11 1.51 74 12.1 140.68 NaN 14160.0 99.3 99.7
26 Burkina Faso Africa 16460 45.66 3.88 5.78 56 102.4 45.27 NaN 1300.0 60.7 55.9
27 Burundi Africa 9850 44.20 3.87 6.21 53 104.3 22.33 67.2 610.0 NaN NaN
28 Cambodia Western Pacific 14865 31.23 7.67 2.93 65 39.7 96.17 NaN 2230.0 96.4 95.4
29 Cameroon Africa 21700 43.08 4.89 4.94 53 94.9 52.35 NaN 2330.0 99.6 87.4
... ... ... ... ... ... ... ... ... ... ... ... ... ...
163 Sudan Eastern Mediterranean 37195 41.48 4.99 4.56 62 73.1 56.14 71.1 2120.0 NaN NaN
164 Suriname Americas 535 27.83 9.55 2.32 72 20.8 178.88 94.7 NaN NaN NaN
165 Swaziland Africa 1231 38.05 5.34 3.48 50 79.7 63.70 87.4 5930.0 NaN NaN
166 Sweden Europe 9511 16.71 25.32 1.93 82 2.9 118.57 NaN 42200.0 99.7 99.0
167 Switzerland Europe 7997 14.79 23.25 1.51 83 4.3 131.43 NaN 52570.0 98.9 99.5
168 Syrian Arab Republic Eastern Mediterranean 21890 35.35 6.09 3.04 75 15.1 63.17 83.4 NaN NaN NaN
169 Tajikistan Europe 8009 35.75 4.80 3.81 68 58.3 90.64 99.7 2300.0 99.5 96.0
170 Thailand South-East Asia 66785 18.47 13.96 1.43 74 13.2 111.63 NaN 8360.0 NaN NaN
171 The former Yugoslav Republic of Macedonia Europe 2106 16.89 17.56 1.44 75 7.4 107.24 97.3 11090.0 97.3 99.2
172 Timor-Leste South-East Asia 1114 46.33 5.16 6.11 64 56.7 53.23 58.3 NaN 86.2 85.6
173 Togo Africa 6643 41.89 4.44 4.75 56 95.5 50.45 NaN 1040.0 NaN NaN
174 Tonga Western Pacific 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
175 Trinidad and Tobago Americas 1337 20.73 13.18 1.80 71 20.7 135.64 98.8 NaN 97.7 97.0
176 Tunisia Eastern Mediterranean 10875 23.22 10.49 2.04 76 16.1 116.93 NaN 9030.0 NaN NaN
177 Turkey Europe 73997 26.00 10.56 2.08 76 14.2 88.70 NaN 16940.0 99.5 98.3
178 Turkmenistan Europe 5173 28.65 6.30 2.38 63 52.8 68.77 99.6 8690.0 NaN NaN
179 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
180 Uganda Africa 36346 48.54 3.72 6.06 56 68.9 48.38 73.2 1310.0 89.7 92.3
181 Ukraine Europe 45530 14.18 20.76 1.45 71 10.7 122.98 99.7 7040.0 90.8 91.5
182 United Arab Emirates Eastern Mediterranean 9206 14.41 0.81 1.84 76 8.4 148.62 NaN 47890.0 NaN NaN
183 United Kingdom Europe 62783 17.54 23.06 1.90 80 4.8 130.75 NaN 36010.0 99.8 99.6
184 United Republic of Tanzania Africa 47783 44.85 4.89 5.36 59 54.0 55.53 73.2 1500.0 NaN NaN
185 United States of America Americas 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1
186 Uruguay Americas 3395 22.05 18.59 2.07 77 7.2 140.75 98.1 14640.0 NaN NaN
188 Vanuatu Western Pacific 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN
189 Venezuela (Bolivarian Republic of) Americas 29955 28.84 9.17 2.44 75 15.3 97.78 NaN 12430.0 94.7 95.1
190 Viet Nam Western Pacific 90796 22.87 9.32 1.79 75 23.0 143.39 93.2 3250.0 NaN NaN
191 Yemen Eastern Mediterranean 23852 40.72 4.54 4.35 64 60.0 47.05 63.9 2170.0 85.5 70.5
192 Zambia Africa 14075 46.73 3.95 5.77 55 88.5 60.59 71.2 1490.0 91.4 93.9
193 Zimbabwe Africa 13724 40.24 5.68 3.64 54 89.8 72.13 92.2 NaN NaN NaN

193 rows × 13 columns

Method-2

In [316]:
df_who[df_who['Population'].ne(28541)]
Out[316]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
6 Argentina Americas 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN
7 Armenia Europe 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN
8 Australia Western Pacific 23050 18.95 19.46 1.89 82 4.9 108.34 NaN 38110.0 96.9 97.5
9 Austria Europe 8464 14.51 23.52 1.44 81 4.0 154.78 NaN 42050.0 NaN NaN
10 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
12 Bahrain Eastern Mediterranean 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
13 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
15 Belarus Europe 9405 15.10 19.31 1.47 71 5.2 111.88 NaN 14460.0 NaN NaN
16 Belgium Europe 11060 16.88 23.81 1.85 80 4.2 116.61 NaN 39190.0 98.9 99.2
17 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
18 Benin Africa 10051 42.95 4.54 5.01 57 89.5 85.33 42.4 1620.0 NaN NaN
19 Bhutan South-East Asia 742 28.53 6.90 2.32 67 44.6 65.58 NaN 5570.0 88.3 91.5
20 Bolivia (Plurinational State of) Americas 10496 35.23 7.28 3.31 67 41.4 82.82 NaN 4890.0 91.2 91.5
21 Bosnia and Herzegovina Europe 3834 16.35 20.52 1.26 76 6.7 84.52 97.9 9190.0 86.5 88.4
22 Botswana Africa 2004 33.75 5.63 2.71 66 53.3 142.82 84.5 14550.0 NaN NaN
23 Brazil Americas 199000 24.56 10.81 1.82 74 14.4 124.26 NaN 11420.0 NaN NaN
24 Brunei Darussalam Western Pacific 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
25 Bulgaria Europe 7278 13.53 26.11 1.51 74 12.1 140.68 NaN 14160.0 99.3 99.7
26 Burkina Faso Africa 16460 45.66 3.88 5.78 56 102.4 45.27 NaN 1300.0 60.7 55.9
27 Burundi Africa 9850 44.20 3.87 6.21 53 104.3 22.33 67.2 610.0 NaN NaN
28 Cambodia Western Pacific 14865 31.23 7.67 2.93 65 39.7 96.17 NaN 2230.0 96.4 95.4
29 Cameroon Africa 21700 43.08 4.89 4.94 53 94.9 52.35 NaN 2330.0 99.6 87.4
... ... ... ... ... ... ... ... ... ... ... ... ... ...
163 Sudan Eastern Mediterranean 37195 41.48 4.99 4.56 62 73.1 56.14 71.1 2120.0 NaN NaN
164 Suriname Americas 535 27.83 9.55 2.32 72 20.8 178.88 94.7 NaN NaN NaN
165 Swaziland Africa 1231 38.05 5.34 3.48 50 79.7 63.70 87.4 5930.0 NaN NaN
166 Sweden Europe 9511 16.71 25.32 1.93 82 2.9 118.57 NaN 42200.0 99.7 99.0
167 Switzerland Europe 7997 14.79 23.25 1.51 83 4.3 131.43 NaN 52570.0 98.9 99.5
168 Syrian Arab Republic Eastern Mediterranean 21890 35.35 6.09 3.04 75 15.1 63.17 83.4 NaN NaN NaN
169 Tajikistan Europe 8009 35.75 4.80 3.81 68 58.3 90.64 99.7 2300.0 99.5 96.0
170 Thailand South-East Asia 66785 18.47 13.96 1.43 74 13.2 111.63 NaN 8360.0 NaN NaN
171 The former Yugoslav Republic of Macedonia Europe 2106 16.89 17.56 1.44 75 7.4 107.24 97.3 11090.0 97.3 99.2
172 Timor-Leste South-East Asia 1114 46.33 5.16 6.11 64 56.7 53.23 58.3 NaN 86.2 85.6
173 Togo Africa 6643 41.89 4.44 4.75 56 95.5 50.45 NaN 1040.0 NaN NaN
174 Tonga Western Pacific 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
175 Trinidad and Tobago Americas 1337 20.73 13.18 1.80 71 20.7 135.64 98.8 NaN 97.7 97.0
176 Tunisia Eastern Mediterranean 10875 23.22 10.49 2.04 76 16.1 116.93 NaN 9030.0 NaN NaN
177 Turkey Europe 73997 26.00 10.56 2.08 76 14.2 88.70 NaN 16940.0 99.5 98.3
178 Turkmenistan Europe 5173 28.65 6.30 2.38 63 52.8 68.77 99.6 8690.0 NaN NaN
179 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
180 Uganda Africa 36346 48.54 3.72 6.06 56 68.9 48.38 73.2 1310.0 89.7 92.3
181 Ukraine Europe 45530 14.18 20.76 1.45 71 10.7 122.98 99.7 7040.0 90.8 91.5
182 United Arab Emirates Eastern Mediterranean 9206 14.41 0.81 1.84 76 8.4 148.62 NaN 47890.0 NaN NaN
183 United Kingdom Europe 62783 17.54 23.06 1.90 80 4.8 130.75 NaN 36010.0 99.8 99.6
184 United Republic of Tanzania Africa 47783 44.85 4.89 5.36 59 54.0 55.53 73.2 1500.0 NaN NaN
185 United States of America Americas 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1
186 Uruguay Americas 3395 22.05 18.59 2.07 77 7.2 140.75 98.1 14640.0 NaN NaN
188 Vanuatu Western Pacific 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN
189 Venezuela (Bolivarian Republic of) Americas 29955 28.84 9.17 2.44 75 15.3 97.78 NaN 12430.0 94.7 95.1
190 Viet Nam Western Pacific 90796 22.87 9.32 1.79 75 23.0 143.39 93.2 3250.0 NaN NaN
191 Yemen Eastern Mediterranean 23852 40.72 4.54 4.35 64 60.0 47.05 63.9 2170.0 85.5 70.5
192 Zambia Africa 14075 46.73 3.95 5.77 55 88.5 60.59 71.2 1490.0 91.4 93.9
193 Zimbabwe Africa 13724 40.24 5.68 3.64 54 89.8 72.13 92.2 NaN NaN NaN

193 rows × 13 columns

Example-1(e)

Method-1

In [317]:
df_who[df_who.Population>=318000]
Out[317]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
35 China Western Pacific 1390000 17.95 13.42 1.66 76 14.0 73.19 94.3 8390.0 NaN NaN
77 India South-East Asia 1240000 29.43 8.10 2.53 65 56.3 72.00 NaN 3590.0 NaN NaN
185 United States of America Americas 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1

Method-2

In [318]:
df_who[df_who['Population'].ge(318000)]
Out[318]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
35 China Western Pacific 1390000 17.95 13.42 1.66 76 14.0 73.19 94.3 8390.0 NaN NaN
77 India South-East Asia 1240000 29.43 8.10 2.53 65 56.3 72.00 NaN 3590.0 NaN NaN
185 United States of America Americas 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1

Example-1(f)

Method-1

In [319]:
df_who[df_who.Population<=10000]
Out[319]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
7 Armenia Europe 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN
9 Austria Europe 8464 14.51 23.52 1.44 81 4.0 154.78 NaN 42050.0 NaN NaN
10 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
12 Bahrain Eastern Mediterranean 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
15 Belarus Europe 9405 15.10 19.31 1.47 71 5.2 111.88 NaN 14460.0 NaN NaN
17 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
19 Bhutan South-East Asia 742 28.53 6.90 2.32 67 44.6 65.58 NaN 5570.0 88.3 91.5
21 Bosnia and Herzegovina Europe 3834 16.35 20.52 1.26 76 6.7 84.52 97.9 9190.0 86.5 88.4
22 Botswana Africa 2004 33.75 5.63 2.71 66 53.3 142.82 84.5 14550.0 NaN NaN
24 Brunei Darussalam Western Pacific 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
25 Bulgaria Europe 7278 13.53 26.11 1.51 74 12.1 140.68 NaN 14160.0 99.3 99.7
27 Burundi Africa 9850 44.20 3.87 6.21 53 104.3 22.33 67.2 610.0 NaN NaN
31 Cape Verde Africa 494 30.17 7.05 2.38 72 22.2 79.19 84.3 3980.0 94.6 92.4
32 Central African Republic Africa 4525 40.07 5.74 4.54 48 128.6 40.65 56.0 810.0 81.3 60.6
37 Comoros Africa 718 42.17 4.50 4.85 62 77.6 28.71 74.9 1110.0 NaN NaN
38 Congo Africa 4337 42.37 5.13 5.05 58 96.0 93.84 NaN 3240.0 92.3 89.3
39 Cook Islands Western Pacific 21 30.61 9.07 NaN 77 10.6 NaN NaN NaN 97.6 99.3
40 Costa Rica Americas 4805 23.94 10.15 1.83 79 9.9 92.20 96.2 11860.0 NaN NaN
42 Croatia Europe 4307 14.98 24.69 1.48 77 4.7 116.37 98.8 18760.0 94.8 97.0
44 Cyprus Europe 1129 17.16 16.92 1.47 81 3.2 97.71 98.3 NaN 99.1 99.5
48 Denmark Europe 5598 17.66 23.90 1.88 79 3.7 128.47 NaN 41900.0 94.8 96.9
49 Djibouti Eastern Mediterranean 860 33.72 5.96 3.53 58 80.9 21.32 NaN NaN NaN NaN
50 Dominica Americas 72 25.96 12.35 NaN 74 12.6 164.02 NaN 13000.0 NaN NaN
54 El Salvador Americas 6297 30.62 9.64 2.24 72 15.9 133.54 84.5 6640.0 95.2 95.5
55 Equatorial Guinea Africa 736 38.95 4.53 5.04 54 100.3 59.15 93.9 25620.0 56.5 56.0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
137 Qatar Eastern Mediterranean 2051 13.28 1.73 2.06 82 7.4 123.11 96.3 86440.0 95.7 96.6
139 Republic of Moldova Europe 3514 16.52 16.72 1.47 71 17.6 104.80 98.5 3640.0 90.1 90.1
143 Saint Kitts and Nevis Americas 54 25.96 12.35 NaN 74 9.2 NaN NaN 16470.0 85.8 86.2
144 Saint Lucia Americas 181 24.31 12.13 1.96 75 17.5 123.00 NaN 11220.0 90.2 89.2
145 Saint Vincent and the Grenadines Americas 109 25.70 9.92 2.05 74 23.4 120.52 NaN 10440.0 NaN NaN
146 Samoa Western Pacific 189 37.88 7.39 4.28 73 17.8 NaN 98.8 4270.0 93.2 97.1
147 San Marino Europe 31 14.04 26.97 NaN 83 3.3 111.75 NaN NaN NaN NaN
148 Sao Tome and Principe Africa 188 41.60 4.76 4.22 63 53.2 68.26 89.2 2080.0 NaN NaN
151 Serbia Europe 9553 16.45 20.52 1.37 74 6.6 125.39 97.9 11540.0 94.7 94.4
152 Seychelles Africa 92 21.95 10.05 2.23 74 13.1 145.71 91.8 25140.0 NaN NaN
153 Sierra Leone Africa 5979 41.74 4.41 4.86 47 181.6 35.63 42.1 840.0 NaN NaN
154 Singapore Western Pacific 5303 16.48 15.13 1.27 82 2.9 150.24 95.9 59380.0 NaN NaN
155 Slovakia Europe 5446 15.00 18.60 1.37 76 7.5 109.35 NaN 22130.0 NaN NaN
156 Slovenia Europe 2068 14.16 23.16 1.49 80 3.1 106.56 99.7 26510.0 97.7 97.3
157 Solomon Islands Western Pacific 550 40.37 5.10 4.17 70 31.1 49.77 NaN 2350.0 87.7 87.3
164 Suriname Americas 535 27.83 9.55 2.32 72 20.8 178.88 94.7 NaN NaN NaN
165 Swaziland Africa 1231 38.05 5.34 3.48 50 79.7 63.70 87.4 5930.0 NaN NaN
166 Sweden Europe 9511 16.71 25.32 1.93 82 2.9 118.57 NaN 42200.0 99.7 99.0
167 Switzerland Europe 7997 14.79 23.25 1.51 83 4.3 131.43 NaN 52570.0 98.9 99.5
169 Tajikistan Europe 8009 35.75 4.80 3.81 68 58.3 90.64 99.7 2300.0 99.5 96.0
171 The former Yugoslav Republic of Macedonia Europe 2106 16.89 17.56 1.44 75 7.4 107.24 97.3 11090.0 97.3 99.2
172 Timor-Leste South-East Asia 1114 46.33 5.16 6.11 64 56.7 53.23 58.3 NaN 86.2 85.6
173 Togo Africa 6643 41.89 4.44 4.75 56 95.5 50.45 NaN 1040.0 NaN NaN
174 Tonga Western Pacific 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
175 Trinidad and Tobago Americas 1337 20.73 13.18 1.80 71 20.7 135.64 98.8 NaN 97.7 97.0
178 Turkmenistan Europe 5173 28.65 6.30 2.38 63 52.8 68.77 99.6 8690.0 NaN NaN
179 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
182 United Arab Emirates Eastern Mediterranean 9206 14.41 0.81 1.84 76 8.4 148.62 NaN 47890.0 NaN NaN
186 Uruguay Americas 3395 22.05 18.59 2.07 77 7.2 140.75 98.1 14640.0 NaN NaN
188 Vanuatu Western Pacific 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN

108 rows × 13 columns

Method-2

In [320]:
df_who[df_who['Population'].le(10000)]
Out[320]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
7 Armenia Europe 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN
9 Austria Europe 8464 14.51 23.52 1.44 81 4.0 154.78 NaN 42050.0 NaN NaN
10 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
12 Bahrain Eastern Mediterranean 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
15 Belarus Europe 9405 15.10 19.31 1.47 71 5.2 111.88 NaN 14460.0 NaN NaN
17 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
19 Bhutan South-East Asia 742 28.53 6.90 2.32 67 44.6 65.58 NaN 5570.0 88.3 91.5
21 Bosnia and Herzegovina Europe 3834 16.35 20.52 1.26 76 6.7 84.52 97.9 9190.0 86.5 88.4
22 Botswana Africa 2004 33.75 5.63 2.71 66 53.3 142.82 84.5 14550.0 NaN NaN
24 Brunei Darussalam Western Pacific 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
25 Bulgaria Europe 7278 13.53 26.11 1.51 74 12.1 140.68 NaN 14160.0 99.3 99.7
27 Burundi Africa 9850 44.20 3.87 6.21 53 104.3 22.33 67.2 610.0 NaN NaN
31 Cape Verde Africa 494 30.17 7.05 2.38 72 22.2 79.19 84.3 3980.0 94.6 92.4
32 Central African Republic Africa 4525 40.07 5.74 4.54 48 128.6 40.65 56.0 810.0 81.3 60.6
37 Comoros Africa 718 42.17 4.50 4.85 62 77.6 28.71 74.9 1110.0 NaN NaN
38 Congo Africa 4337 42.37 5.13 5.05 58 96.0 93.84 NaN 3240.0 92.3 89.3
39 Cook Islands Western Pacific 21 30.61 9.07 NaN 77 10.6 NaN NaN NaN 97.6 99.3
40 Costa Rica Americas 4805 23.94 10.15 1.83 79 9.9 92.20 96.2 11860.0 NaN NaN
42 Croatia Europe 4307 14.98 24.69 1.48 77 4.7 116.37 98.8 18760.0 94.8 97.0
44 Cyprus Europe 1129 17.16 16.92 1.47 81 3.2 97.71 98.3 NaN 99.1 99.5
48 Denmark Europe 5598 17.66 23.90 1.88 79 3.7 128.47 NaN 41900.0 94.8 96.9
49 Djibouti Eastern Mediterranean 860 33.72 5.96 3.53 58 80.9 21.32 NaN NaN NaN NaN
50 Dominica Americas 72 25.96 12.35 NaN 74 12.6 164.02 NaN 13000.0 NaN NaN
54 El Salvador Americas 6297 30.62 9.64 2.24 72 15.9 133.54 84.5 6640.0 95.2 95.5
55 Equatorial Guinea Africa 736 38.95 4.53 5.04 54 100.3 59.15 93.9 25620.0 56.5 56.0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
137 Qatar Eastern Mediterranean 2051 13.28 1.73 2.06 82 7.4 123.11 96.3 86440.0 95.7 96.6
139 Republic of Moldova Europe 3514 16.52 16.72 1.47 71 17.6 104.80 98.5 3640.0 90.1 90.1
143 Saint Kitts and Nevis Americas 54 25.96 12.35 NaN 74 9.2 NaN NaN 16470.0 85.8 86.2
144 Saint Lucia Americas 181 24.31 12.13 1.96 75 17.5 123.00 NaN 11220.0 90.2 89.2
145 Saint Vincent and the Grenadines Americas 109 25.70 9.92 2.05 74 23.4 120.52 NaN 10440.0 NaN NaN
146 Samoa Western Pacific 189 37.88 7.39 4.28 73 17.8 NaN 98.8 4270.0 93.2 97.1
147 San Marino Europe 31 14.04 26.97 NaN 83 3.3 111.75 NaN NaN NaN NaN
148 Sao Tome and Principe Africa 188 41.60 4.76 4.22 63 53.2 68.26 89.2 2080.0 NaN NaN
151 Serbia Europe 9553 16.45 20.52 1.37 74 6.6 125.39 97.9 11540.0 94.7 94.4
152 Seychelles Africa 92 21.95 10.05 2.23 74 13.1 145.71 91.8 25140.0 NaN NaN
153 Sierra Leone Africa 5979 41.74 4.41 4.86 47 181.6 35.63 42.1 840.0 NaN NaN
154 Singapore Western Pacific 5303 16.48 15.13 1.27 82 2.9 150.24 95.9 59380.0 NaN NaN
155 Slovakia Europe 5446 15.00 18.60 1.37 76 7.5 109.35 NaN 22130.0 NaN NaN
156 Slovenia Europe 2068 14.16 23.16 1.49 80 3.1 106.56 99.7 26510.0 97.7 97.3
157 Solomon Islands Western Pacific 550 40.37 5.10 4.17 70 31.1 49.77 NaN 2350.0 87.7 87.3
164 Suriname Americas 535 27.83 9.55 2.32 72 20.8 178.88 94.7 NaN NaN NaN
165 Swaziland Africa 1231 38.05 5.34 3.48 50 79.7 63.70 87.4 5930.0 NaN NaN
166 Sweden Europe 9511 16.71 25.32 1.93 82 2.9 118.57 NaN 42200.0 99.7 99.0
167 Switzerland Europe 7997 14.79 23.25 1.51 83 4.3 131.43 NaN 52570.0 98.9 99.5
169 Tajikistan Europe 8009 35.75 4.80 3.81 68 58.3 90.64 99.7 2300.0 99.5 96.0
171 The former Yugoslav Republic of Macedonia Europe 2106 16.89 17.56 1.44 75 7.4 107.24 97.3 11090.0 97.3 99.2
172 Timor-Leste South-East Asia 1114 46.33 5.16 6.11 64 56.7 53.23 58.3 NaN 86.2 85.6
173 Togo Africa 6643 41.89 4.44 4.75 56 95.5 50.45 NaN 1040.0 NaN NaN
174 Tonga Western Pacific 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
175 Trinidad and Tobago Americas 1337 20.73 13.18 1.80 71 20.7 135.64 98.8 NaN 97.7 97.0
178 Turkmenistan Europe 5173 28.65 6.30 2.38 63 52.8 68.77 99.6 8690.0 NaN NaN
179 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
182 United Arab Emirates Eastern Mediterranean 9206 14.41 0.81 1.84 76 8.4 148.62 NaN 47890.0 NaN NaN
186 Uruguay Americas 3395 22.05 18.59 2.07 77 7.2 140.75 98.1 14640.0 NaN NaN
188 Vanuatu Western Pacific 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN

108 rows × 13 columns

Example-2

Method-1

In [321]:
df_who[df_who.Population==df_who.Population.max()]
Out[321]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
35 China Western Pacific 1390000 17.95 13.42 1.66 76 14.0 73.19 94.3 8390.0 NaN NaN

Method-2

In [322]:
df_who[df_who.Population==df_who['Population'].max()]
Out[322]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
35 China Western Pacific 1390000 17.95 13.42 1.66 76 14.0 73.19 94.3 8390.0 NaN NaN

Note:

  • Statment-1: df_who[df_who.Population==df_who.Population.max()]
  • Statment-2: df_who[df_who.Population==df_who['Population'].max()]
  • Statment-1=Statment-2

Example-3

In [323]:
from IPython.display import YouTubeVideo
YouTubeVideo('YPItfQ87qjM',width=900, height=500)
Out[323]:
  • Filter Population>=20000 and LifeExpectancy=60.

Method-1

In [324]:
df_who[(df_who.Population>=20000) & (df_who.LifeExpectancy==60)]
Out[324]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
58 Ethiopia Africa 91729 43.29 5.17 4.77 60 68.3 16.67 NaN 1110.0 84.8 79.5
88 Kenya Africa 43178 42.37 4.25 4.54 60 72.9 67.49 87.4 1710.0 NaN NaN

Method-2

In [325]:
df_who[df_who['Population'].ge(20000)& (df_who['LifeExpectancy'].eq(60))]
Out[325]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
58 Ethiopia Africa 91729 43.29 5.17 4.77 60 68.3 16.67 NaN 1110.0 84.8 79.5
88 Kenya Africa 43178 42.37 4.25 4.54 60 72.9 67.49 87.4 1710.0 NaN NaN

Example-4

contains

In [326]:
df_who[df_who.Country.str.contains('istan')]
Out[326]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
128 Pakistan Eastern Mediterranean 179000 34.31 6.44 3.35 67 85.9 61.61 NaN 2870.0 81.3 66.5
169 Tajikistan Europe 8009 35.75 4.80 3.81 68 58.3 90.64 99.7 2300.0 99.5 96.0
178 Turkmenistan Europe 5173 28.65 6.30 2.38 63 52.8 68.77 99.6 8690.0 NaN NaN
187 Uzbekistan Europe 28541 28.90 6.38 2.38 68 39.6 91.65 99.4 3420.0 93.3 91.0

Example-5

startswith

In [327]:
df_who[df_who['Country'].str.startswith('A')]
Out[327]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
6 Argentina Americas 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN
7 Armenia Europe 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN
8 Australia Western Pacific 23050 18.95 19.46 1.89 82 4.9 108.34 NaN 38110.0 96.9 97.5
9 Austria Europe 8464 14.51 23.52 1.44 81 4.0 154.78 NaN 42050.0 NaN NaN
10 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1

Example-6

endswith

In [328]:
df_who[df_who['Country'].str.endswith('n')]
Out[328]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
10 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
12 Bahrain Eastern Mediterranean 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
18 Benin Africa 10051 42.95 4.54 5.01 57 89.5 85.33 42.4 1620.0 NaN NaN
19 Bhutan South-East Asia 742 28.53 6.90 2.32 67 44.6 65.58 NaN 5570.0 88.3 91.5
29 Cameroon Africa 21700 43.08 4.89 4.94 53 94.9 52.35 NaN 2330.0 99.6 87.4
62 Gabon Africa 1633 38.49 7.38 4.18 62 62.0 117.32 88.4 13740.0 NaN NaN
85 Japan Western Pacific 127000 13.12 31.92 1.39 83 3.0 104.95 NaN 35330.0 NaN NaN
86 Jordan Eastern Mediterranean 7009 34.13 5.30 3.39 74 19.1 118.20 92.6 5930.0 90.8 90.7
87 Kazakhstan Europe 16271 25.46 10.04 2.52 67 18.7 155.74 99.7 11250.0 NaN NaN
91 Kyrgyzstan Europe 5474 30.21 6.34 3.03 69 26.6 116.40 NaN 2180.0 95.5 95.1
94 Lebanon Eastern Mediterranean 4647 21.64 12.03 1.50 74 9.3 78.65 NaN 14470.0 93.5 92.9
127 Oman Eastern Mediterranean 3314 24.19 3.99 2.90 72 11.6 168.97 NaN NaN NaN NaN
128 Pakistan Eastern Mediterranean 179000 34.31 6.44 3.35 67 85.9 61.61 NaN 2870.0 81.3 66.5
141 Russian Federation Europe 143000 15.45 18.60 1.51 69 10.3 179.31 99.6 20560.0 NaN NaN
160 South Sudan Eastern Mediterranean 10838 42.28 5.26 5.10 54 104.0 NaN NaN NaN NaN NaN
161 Spain Europe 46755 15.20 22.86 1.47 82 4.5 113.22 97.7 31400.0 99.7 99.8
163 Sudan Eastern Mediterranean 37195 41.48 4.99 4.56 62 73.1 56.14 71.1 2120.0 NaN NaN
166 Sweden Europe 9511 16.71 25.32 1.93 82 2.9 118.57 NaN 42200.0 99.7 99.0
169 Tajikistan Europe 8009 35.75 4.80 3.81 68 58.3 90.64 99.7 2300.0 99.5 96.0
178 Turkmenistan Europe 5173 28.65 6.30 2.38 63 52.8 68.77 99.6 8690.0 NaN NaN
187 Uzbekistan Europe 28541 28.90 6.38 2.38 68 39.6 91.65 99.4 3420.0 93.3 91.0
191 Yemen Eastern Mediterranean 23852 40.72 4.54 4.35 64 60.0 47.05 63.9 2170.0 85.5 70.5

Example-7

  • Max Population Data Frame (Specific Columns)
In [329]:
df_who[['Region','CellularSubscribers','Population']][df_who.Population==df_who['Population'].max()]
Out[329]:
Region CellularSubscribers Population
35 Western Pacific 73.19 1390000

Example-8

unique

  • Unique Values in a Single Variable
In [330]:
df_who.Region.unique()
Out[330]:
array(['Eastern Mediterranean', 'Europe', 'Africa', 'Americas',
       'Western Pacific', 'South-East Asia'], dtype=object)

nunique

  • More Information
  • Return Series with number of distinct observations over requested axis
In [331]:
df_who.Region.nunique()
Out[331]:
6
  • If we wanted to filter the DataFrame to only show Region with the 'africa' or 'americas' or 'south-east asia', we could use multiple conditions separated by the "or" operator:
In [332]:
df_who[(df_who.Region == 'Africa') |
       (df_who.Region == 'Americas') |
       (df_who.Region == 'South-East Asia')]
Out[332]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
6 Argentina Americas 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
13 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
17 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
18 Benin Africa 10051 42.95 4.54 5.01 57 89.5 85.33 42.4 1620.0 NaN NaN
19 Bhutan South-East Asia 742 28.53 6.90 2.32 67 44.6 65.58 NaN 5570.0 88.3 91.5
20 Bolivia (Plurinational State of) Americas 10496 35.23 7.28 3.31 67 41.4 82.82 NaN 4890.0 91.2 91.5
22 Botswana Africa 2004 33.75 5.63 2.71 66 53.3 142.82 84.5 14550.0 NaN NaN
23 Brazil Americas 199000 24.56 10.81 1.82 74 14.4 124.26 NaN 11420.0 NaN NaN
26 Burkina Faso Africa 16460 45.66 3.88 5.78 56 102.4 45.27 NaN 1300.0 60.7 55.9
27 Burundi Africa 9850 44.20 3.87 6.21 53 104.3 22.33 67.2 610.0 NaN NaN
29 Cameroon Africa 21700 43.08 4.89 4.94 53 94.9 52.35 NaN 2330.0 99.6 87.4
30 Canada Americas 34838 16.37 20.82 1.66 82 5.3 79.73 NaN 39660.0 NaN NaN
31 Cape Verde Africa 494 30.17 7.05 2.38 72 22.2 79.19 84.3 3980.0 94.6 92.4
32 Central African Republic Africa 4525 40.07 5.74 4.54 48 128.6 40.65 56.0 810.0 81.3 60.6
33 Chad Africa 12448 48.52 3.80 6.49 51 149.8 31.80 34.5 1360.0 NaN NaN
34 Chile Americas 17465 21.38 13.80 1.84 79 9.1 129.71 NaN 16330.0 94.3 94.4
36 Colombia Americas 47704 28.03 9.19 2.35 78 17.6 98.45 93.4 9560.0 91.7 91.3
37 Comoros Africa 718 42.17 4.50 4.85 62 77.6 28.71 74.9 1110.0 NaN NaN
38 Congo Africa 4337 42.37 5.13 5.05 58 96.0 93.84 NaN 3240.0 92.3 89.3
40 Costa Rica Americas 4805 23.94 10.15 1.83 79 9.9 92.20 96.2 11860.0 NaN NaN
41 Ivory Coast Africa 19840 41.48 5.10 4.91 56 107.6 86.06 56.2 1710.0 NaN NaN
43 Cuba Americas 11271 16.58 17.95 1.46 78 5.5 11.69 99.8 NaN 100.0 99.7
46 Democratic People's Republic of Korea South-East Asia 24763 21.98 12.74 2.00 69 28.8 4.09 NaN NaN NaN NaN
47 Democratic Republic of the Congo Africa 65705 45.11 4.51 6.15 49 145.7 23.09 66.8 340.0 NaN NaN
50 Dominica Americas 72 25.96 12.35 NaN 74 12.6 164.02 NaN 13000.0 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ...
119 Nepal South-East Asia 27474 35.58 7.65 2.50 68 41.6 43.81 60.3 1260.0 NaN NaN
122 Nicaragua Americas 5992 33.37 6.59 2.59 73 24.4 82.15 NaN 3730.0 93.2 94.5
123 Niger Africa 17157 49.99 4.26 7.58 56 113.5 29.52 NaN 720.0 64.2 52.0
124 Nigeria Africa 169000 44.23 4.49 6.02 53 123.7 58.58 61.3 2290.0 60.1 54.8
130 Panama Americas 3802 28.65 10.13 2.52 77 18.5 188.60 94.1 14510.0 99.1 98.2
132 Paraguay Americas 6687 32.78 8.01 2.93 75 22.0 99.40 93.9 5390.0 84.4 83.9
133 Peru Americas 29988 29.18 9.12 2.48 77 18.2 110.41 NaN 9440.0 97.8 98.5
142 Rwanda Africa 11458 43.56 3.94 4.73 60 55.0 40.63 71.1 1270.0 NaN NaN
143 Saint Kitts and Nevis Americas 54 25.96 12.35 NaN 74 9.2 NaN NaN 16470.0 85.8 86.2
144 Saint Lucia Americas 181 24.31 12.13 1.96 75 17.5 123.00 NaN 11220.0 90.2 89.2
145 Saint Vincent and the Grenadines Americas 109 25.70 9.92 2.05 74 23.4 120.52 NaN 10440.0 NaN NaN
148 Sao Tome and Principe Africa 188 41.60 4.76 4.22 63 53.2 68.26 89.2 2080.0 NaN NaN
150 Senegal Africa 13726 43.54 4.57 5.02 61 59.6 73.25 NaN 1940.0 75.9 80.2
152 Seychelles Africa 92 21.95 10.05 2.23 74 13.1 145.71 91.8 25140.0 NaN NaN
153 Sierra Leone Africa 5979 41.74 4.41 4.86 47 181.6 35.63 42.1 840.0 NaN NaN
159 South Africa Africa 52386 29.53 8.44 2.44 58 44.6 126.83 NaN 10710.0 NaN NaN
162 Sri Lanka South-East Asia 21098 25.15 12.40 2.35 75 9.6 87.05 91.2 5520.0 93.9 94.4
164 Suriname Americas 535 27.83 9.55 2.32 72 20.8 178.88 94.7 NaN NaN NaN
165 Swaziland Africa 1231 38.05 5.34 3.48 50 79.7 63.70 87.4 5930.0 NaN NaN
170 Thailand South-East Asia 66785 18.47 13.96 1.43 74 13.2 111.63 NaN 8360.0 NaN NaN
172 Timor-Leste South-East Asia 1114 46.33 5.16 6.11 64 56.7 53.23 58.3 NaN 86.2 85.6
173 Togo Africa 6643 41.89 4.44 4.75 56 95.5 50.45 NaN 1040.0 NaN NaN
175 Trinidad and Tobago Americas 1337 20.73 13.18 1.80 71 20.7 135.64 98.8 NaN 97.7 97.0
180 Uganda Africa 36346 48.54 3.72 6.06 56 68.9 48.38 73.2 1310.0 89.7 92.3
184 United Republic of Tanzania Africa 47783 44.85 4.89 5.36 59 54.0 55.53 73.2 1500.0 NaN NaN
185 United States of America Americas 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1
186 Uruguay Americas 3395 22.05 18.59 2.07 77 7.2 140.75 98.1 14640.0 NaN NaN
189 Venezuela (Bolivarian Republic of) Americas 29955 28.84 9.17 2.44 75 15.3 97.78 NaN 12430.0 94.7 95.1
192 Zambia Africa 14075 46.73 3.95 5.77 55 88.5 60.59 71.2 1490.0 91.4 93.9
193 Zimbabwe Africa 13724 40.24 5.68 3.64 54 89.8 72.13 92.2 NaN NaN NaN

92 rows × 13 columns

isin

  • However, you can actually rewrite this code more clearly by using the isin() method and passing it a list of Region:
  • Return boolean DataFrame showing whether each element in the DataFrame is contained in values.
  • More Information
In [333]:
df_who[df_who.Region.isin(['Africa', 'Americas', 'South-East Asia'])]
Out[333]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
6 Argentina Americas 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
13 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
17 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
18 Benin Africa 10051 42.95 4.54 5.01 57 89.5 85.33 42.4 1620.0 NaN NaN
19 Bhutan South-East Asia 742 28.53 6.90 2.32 67 44.6 65.58 NaN 5570.0 88.3 91.5
20 Bolivia (Plurinational State of) Americas 10496 35.23 7.28 3.31 67 41.4 82.82 NaN 4890.0 91.2 91.5
22 Botswana Africa 2004 33.75 5.63 2.71 66 53.3 142.82 84.5 14550.0 NaN NaN
23 Brazil Americas 199000 24.56 10.81 1.82 74 14.4 124.26 NaN 11420.0 NaN NaN
26 Burkina Faso Africa 16460 45.66 3.88 5.78 56 102.4 45.27 NaN 1300.0 60.7 55.9
27 Burundi Africa 9850 44.20 3.87 6.21 53 104.3 22.33 67.2 610.0 NaN NaN
29 Cameroon Africa 21700 43.08 4.89 4.94 53 94.9 52.35 NaN 2330.0 99.6 87.4
30 Canada Americas 34838 16.37 20.82 1.66 82 5.3 79.73 NaN 39660.0 NaN NaN
31 Cape Verde Africa 494 30.17 7.05 2.38 72 22.2 79.19 84.3 3980.0 94.6 92.4
32 Central African Republic Africa 4525 40.07 5.74 4.54 48 128.6 40.65 56.0 810.0 81.3 60.6
33 Chad Africa 12448 48.52 3.80 6.49 51 149.8 31.80 34.5 1360.0 NaN NaN
34 Chile Americas 17465 21.38 13.80 1.84 79 9.1 129.71 NaN 16330.0 94.3 94.4
36 Colombia Americas 47704 28.03 9.19 2.35 78 17.6 98.45 93.4 9560.0 91.7 91.3
37 Comoros Africa 718 42.17 4.50 4.85 62 77.6 28.71 74.9 1110.0 NaN NaN
38 Congo Africa 4337 42.37 5.13 5.05 58 96.0 93.84 NaN 3240.0 92.3 89.3
40 Costa Rica Americas 4805 23.94 10.15 1.83 79 9.9 92.20 96.2 11860.0 NaN NaN
41 Ivory Coast Africa 19840 41.48 5.10 4.91 56 107.6 86.06 56.2 1710.0 NaN NaN
43 Cuba Americas 11271 16.58 17.95 1.46 78 5.5 11.69 99.8 NaN 100.0 99.7
46 Democratic People's Republic of Korea South-East Asia 24763 21.98 12.74 2.00 69 28.8 4.09 NaN NaN NaN NaN
47 Democratic Republic of the Congo Africa 65705 45.11 4.51 6.15 49 145.7 23.09 66.8 340.0 NaN NaN
50 Dominica Americas 72 25.96 12.35 NaN 74 12.6 164.02 NaN 13000.0 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ...
119 Nepal South-East Asia 27474 35.58 7.65 2.50 68 41.6 43.81 60.3 1260.0 NaN NaN
122 Nicaragua Americas 5992 33.37 6.59 2.59 73 24.4 82.15 NaN 3730.0 93.2 94.5
123 Niger Africa 17157 49.99 4.26 7.58 56 113.5 29.52 NaN 720.0 64.2 52.0
124 Nigeria Africa 169000 44.23 4.49 6.02 53 123.7 58.58 61.3 2290.0 60.1 54.8
130 Panama Americas 3802 28.65 10.13 2.52 77 18.5 188.60 94.1 14510.0 99.1 98.2
132 Paraguay Americas 6687 32.78 8.01 2.93 75 22.0 99.40 93.9 5390.0 84.4 83.9
133 Peru Americas 29988 29.18 9.12 2.48 77 18.2 110.41 NaN 9440.0 97.8 98.5
142 Rwanda Africa 11458 43.56 3.94 4.73 60 55.0 40.63 71.1 1270.0 NaN NaN
143 Saint Kitts and Nevis Americas 54 25.96 12.35 NaN 74 9.2 NaN NaN 16470.0 85.8 86.2
144 Saint Lucia Americas 181 24.31 12.13 1.96 75 17.5 123.00 NaN 11220.0 90.2 89.2
145 Saint Vincent and the Grenadines Americas 109 25.70 9.92 2.05 74 23.4 120.52 NaN 10440.0 NaN NaN
148 Sao Tome and Principe Africa 188 41.60 4.76 4.22 63 53.2 68.26 89.2 2080.0 NaN NaN
150 Senegal Africa 13726 43.54 4.57 5.02 61 59.6 73.25 NaN 1940.0 75.9 80.2
152 Seychelles Africa 92 21.95 10.05 2.23 74 13.1 145.71 91.8 25140.0 NaN NaN
153 Sierra Leone Africa 5979 41.74 4.41 4.86 47 181.6 35.63 42.1 840.0 NaN NaN
159 South Africa Africa 52386 29.53 8.44 2.44 58 44.6 126.83 NaN 10710.0 NaN NaN
162 Sri Lanka South-East Asia 21098 25.15 12.40 2.35 75 9.6 87.05 91.2 5520.0 93.9 94.4
164 Suriname Americas 535 27.83 9.55 2.32 72 20.8 178.88 94.7 NaN NaN NaN
165 Swaziland Africa 1231 38.05 5.34 3.48 50 79.7 63.70 87.4 5930.0 NaN NaN
170 Thailand South-East Asia 66785 18.47 13.96 1.43 74 13.2 111.63 NaN 8360.0 NaN NaN
172 Timor-Leste South-East Asia 1114 46.33 5.16 6.11 64 56.7 53.23 58.3 NaN 86.2 85.6
173 Togo Africa 6643 41.89 4.44 4.75 56 95.5 50.45 NaN 1040.0 NaN NaN
175 Trinidad and Tobago Americas 1337 20.73 13.18 1.80 71 20.7 135.64 98.8 NaN 97.7 97.0
180 Uganda Africa 36346 48.54 3.72 6.06 56 68.9 48.38 73.2 1310.0 89.7 92.3
184 United Republic of Tanzania Africa 47783 44.85 4.89 5.36 59 54.0 55.53 73.2 1500.0 NaN NaN
185 United States of America Americas 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1
186 Uruguay Americas 3395 22.05 18.59 2.07 77 7.2 140.75 98.1 14640.0 NaN NaN
189 Venezuela (Bolivarian Republic of) Americas 29955 28.84 9.17 2.44 75 15.3 97.78 NaN 12430.0 94.7 95.1
192 Zambia Africa 14075 46.73 3.95 5.77 55 88.5 60.59 71.2 1490.0 91.4 93.9
193 Zimbabwe Africa 13724 40.24 5.68 3.64 54 89.8 72.13 92.2 NaN NaN NaN

92 rows × 13 columns

  • And if you want to reverse this filter, so that you are excluding (rather than including) those three Region, you can put a tilde in front of the condition:
In [334]:
df_who[~df_who.Region.isin(['Africa', 'Americas', 'South-East Asia'])]
Out[334]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
7 Armenia Europe 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN
8 Australia Western Pacific 23050 18.95 19.46 1.89 82 4.9 108.34 NaN 38110.0 96.9 97.5
9 Austria Europe 8464 14.51 23.52 1.44 81 4.0 154.78 NaN 42050.0 NaN NaN
10 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
12 Bahrain Eastern Mediterranean 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
15 Belarus Europe 9405 15.10 19.31 1.47 71 5.2 111.88 NaN 14460.0 NaN NaN
16 Belgium Europe 11060 16.88 23.81 1.85 80 4.2 116.61 NaN 39190.0 98.9 99.2
21 Bosnia and Herzegovina Europe 3834 16.35 20.52 1.26 76 6.7 84.52 97.9 9190.0 86.5 88.4
24 Brunei Darussalam Western Pacific 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
25 Bulgaria Europe 7278 13.53 26.11 1.51 74 12.1 140.68 NaN 14160.0 99.3 99.7
28 Cambodia Western Pacific 14865 31.23 7.67 2.93 65 39.7 96.17 NaN 2230.0 96.4 95.4
35 China Western Pacific 1390000 17.95 13.42 1.66 76 14.0 73.19 94.3 8390.0 NaN NaN
39 Cook Islands Western Pacific 21 30.61 9.07 NaN 77 10.6 NaN NaN NaN 97.6 99.3
42 Croatia Europe 4307 14.98 24.69 1.48 77 4.7 116.37 98.8 18760.0 94.8 97.0
44 Cyprus Europe 1129 17.16 16.92 1.47 81 3.2 97.71 98.3 NaN 99.1 99.5
45 Czech Republic Europe 10660 14.56 23.23 1.53 78 3.8 123.44 NaN 24370.0 NaN NaN
48 Denmark Europe 5598 17.66 23.90 1.88 79 3.7 128.47 NaN 41900.0 94.8 96.9
49 Djibouti Eastern Mediterranean 860 33.72 5.96 3.53 58 80.9 21.32 NaN NaN NaN NaN
53 Egypt Eastern Mediterranean 80722 31.25 8.62 2.85 73 21.0 101.08 72.0 6120.0 NaN NaN
57 Estonia Europe 1291 15.69 23.92 1.62 76 3.6 138.98 99.8 20850.0 97.7 97.0
59 Fiji Western Pacific 875 28.88 8.38 2.64 70 22.4 83.72 NaN 4610.0 NaN NaN
60 Finland Europe 5408 16.42 25.90 1.85 81 2.9 166.02 NaN 37670.0 97.7 97.9
61 France Europe 63937 18.26 23.82 1.98 82 4.1 94.79 NaN 35910.0 99.1 99.3
64 Georgia Europe 4358 17.62 19.47 1.82 72 19.9 102.31 99.7 5350.0 NaN NaN
65 Germany Europe 82800 13.17 26.72 1.40 81 4.1 132.30 NaN 40230.0 NaN NaN
67 Greece Europe 11125 14.60 25.41 1.51 81 4.8 106.48 97.2 25100.0 98.8 99.3
75 Hungary Europe 9976 14.62 23.41 1.38 75 6.2 117.30 99.0 20310.0 97.8 98.3
... ... ... ... ... ... ... ... ... ... ... ... ... ...
141 Russian Federation Europe 143000 15.45 18.60 1.51 69 10.3 179.31 99.6 20560.0 NaN NaN
146 Samoa Western Pacific 189 37.88 7.39 4.28 73 17.8 NaN 98.8 4270.0 93.2 97.1
147 San Marino Europe 31 14.04 26.97 NaN 83 3.3 111.75 NaN NaN NaN NaN
149 Saudi Arabia Eastern Mediterranean 28288 29.69 4.59 2.76 76 8.6 191.24 86.6 24700.0 96.7 96.5
151 Serbia Europe 9553 16.45 20.52 1.37 74 6.6 125.39 97.9 11540.0 94.7 94.4
154 Singapore Western Pacific 5303 16.48 15.13 1.27 82 2.9 150.24 95.9 59380.0 NaN NaN
155 Slovakia Europe 5446 15.00 18.60 1.37 76 7.5 109.35 NaN 22130.0 NaN NaN
156 Slovenia Europe 2068 14.16 23.16 1.49 80 3.1 106.56 99.7 26510.0 97.7 97.3
157 Solomon Islands Western Pacific 550 40.37 5.10 4.17 70 31.1 49.77 NaN 2350.0 87.7 87.3
158 Somalia Eastern Mediterranean 10195 47.35 4.46 6.77 50 147.4 6.85 NaN NaN NaN NaN
160 South Sudan Eastern Mediterranean 10838 42.28 5.26 5.10 54 104.0 NaN NaN NaN NaN NaN
161 Spain Europe 46755 15.20 22.86 1.47 82 4.5 113.22 97.7 31400.0 99.7 99.8
163 Sudan Eastern Mediterranean 37195 41.48 4.99 4.56 62 73.1 56.14 71.1 2120.0 NaN NaN
166 Sweden Europe 9511 16.71 25.32 1.93 82 2.9 118.57 NaN 42200.0 99.7 99.0
167 Switzerland Europe 7997 14.79 23.25 1.51 83 4.3 131.43 NaN 52570.0 98.9 99.5
168 Syrian Arab Republic Eastern Mediterranean 21890 35.35 6.09 3.04 75 15.1 63.17 83.4 NaN NaN NaN
169 Tajikistan Europe 8009 35.75 4.80 3.81 68 58.3 90.64 99.7 2300.0 99.5 96.0
171 The former Yugoslav Republic of Macedonia Europe 2106 16.89 17.56 1.44 75 7.4 107.24 97.3 11090.0 97.3 99.2
174 Tonga Western Pacific 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
176 Tunisia Eastern Mediterranean 10875 23.22 10.49 2.04 76 16.1 116.93 NaN 9030.0 NaN NaN
177 Turkey Europe 73997 26.00 10.56 2.08 76 14.2 88.70 NaN 16940.0 99.5 98.3
178 Turkmenistan Europe 5173 28.65 6.30 2.38 63 52.8 68.77 99.6 8690.0 NaN NaN
179 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
181 Ukraine Europe 45530 14.18 20.76 1.45 71 10.7 122.98 99.7 7040.0 90.8 91.5
182 United Arab Emirates Eastern Mediterranean 9206 14.41 0.81 1.84 76 8.4 148.62 NaN 47890.0 NaN NaN
183 United Kingdom Europe 62783 17.54 23.06 1.90 80 4.8 130.75 NaN 36010.0 99.8 99.6
187 Uzbekistan Europe 28541 28.90 6.38 2.38 68 39.6 91.65 99.4 3420.0 93.3 91.0
188 Vanuatu Western Pacific 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN
190 Viet Nam Western Pacific 90796 22.87 9.32 1.79 75 23.0 143.39 93.2 3250.0 NaN NaN
191 Yemen Eastern Mediterranean 23852 40.72 4.54 4.35 64 60.0 47.05 63.9 2170.0 85.5 70.5

102 rows × 13 columns

  • This works because tilde is the "not" operator in Python.

Filter a DataFrame by largest/smallest categories

  • Let's say that you needed to filter the WHO DataFrame by Region, but only include the 3 largest Region.
  • We'll start by taking the value_counts() of genre and saving it as a Series called counts:

value_counts()

In [335]:
from IPython.display import YouTubeVideo
YouTubeVideo('QTVTq8SPzxM',width=900, height=500)
Out[335]:
In [336]:
df_who.Region.value_counts()
Out[336]:
Europe                   53
Africa                   46
Americas                 35
Western Pacific          27
Eastern Mediterranean    22
South-East Asia          11
Name: Region, dtype: int64
  • The Series method nlargest() makes it easy to select the 3 largest values in this Series:
  • Return the first n rows ordered by columns in descending order.
  • More Information

sort_index

In [337]:
from IPython.display import YouTubeVideo
YouTubeVideo('15q-is8P_H4',width=900, height=500)
Out[337]:

Example-1

In [338]:
df_who.Region.value_counts().sort_index()
Out[338]:
Africa                   46
Americas                 35
Eastern Mediterranean    22
Europe                   53
South-East Asia          11
Western Pacific          27
Name: Region, dtype: int64

Example-2

In [339]:
df_who.Region.value_counts(normalize=True).sort_index()
Out[339]:
Africa                   0.237113
Americas                 0.180412
Eastern Mediterranean    0.113402
Europe                   0.273196
South-East Asia          0.056701
Western Pacific          0.139175
Name: Region, dtype: float64

Example-3

In [340]:
df_who= pd.read_csv('WHO_csv.csv')
df_who.head()
Out[340]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
In [341]:
df_who.sort_index(axis=1, inplace=True)
In [342]:
df_who.head()
Out[342]:
CellularSubscribers ChildMortality Country FertilityRate GNI LifeExpectancy LiteracyRate Over60 Population PrimarySchoolEnrollmentFemale PrimarySchoolEnrollmentMale Region Under15
0 54.26 98.5 Afghanistan 5.40 1140.0 60 NaN 3.82 29825 NaN NaN Eastern Mediterranean 47.42
1 96.39 16.7 Albania 1.75 8820.0 74 NaN 14.93 3162 NaN NaN Europe 21.33
2 98.99 20.0 Algeria 2.83 8310.0 73 NaN 7.17 38482 96.4 98.2 Africa 27.42
3 75.49 3.2 Andorra NaN NaN 82 NaN 22.86 78 79.4 78.4 Europe 15.20
4 48.38 163.5 Angola 6.10 5230.0 51 70.1 3.84 20821 78.2 93.1 Africa 47.58

nlargest()

In [343]:
df_who.Region.value_counts().nlargest(n=3)
Out[343]:
Europe      53
Africa      46
Americas    35
Name: Region, dtype: int64
  • And all we actually need from this Series is the index:
In [344]:
df_who.Region.value_counts().nlargest(n=3).index
Out[344]:
Index(['Europe', 'Africa', 'Americas'], dtype='object')
In [345]:
df_who[df_who.Region.isin(df_who.Region.value_counts().nlargest(3).index)]
Out[345]:
CellularSubscribers ChildMortality Country FertilityRate GNI LifeExpectancy LiteracyRate Over60 Population PrimarySchoolEnrollmentFemale PrimarySchoolEnrollmentMale Region Under15
1 96.39 16.7 Albania 1.75 8820.0 74 NaN 14.93 3162 NaN NaN Europe 21.33
2 98.99 20.0 Algeria 2.83 8310.0 73 NaN 7.17 38482 96.4 98.2 Africa 27.42
3 75.49 3.2 Andorra NaN NaN 82 NaN 22.86 78 79.4 78.4 Europe 15.20
4 48.38 163.5 Angola 6.10 5230.0 51 70.1 3.84 20821 78.2 93.1 Africa 47.58
5 196.41 9.9 Antigua and Barbuda 2.12 17900.0 75 99.0 12.35 89 84.5 91.1 Americas 25.96
6 134.92 14.2 Argentina 2.20 17130.0 76 97.8 14.97 41087 NaN NaN Americas 24.42
7 103.57 16.4 Armenia 1.74 6100.0 71 99.6 14.06 2969 NaN NaN Europe 20.34
9 154.78 4.0 Austria 1.44 42050.0 81 NaN 23.52 8464 NaN NaN Europe 14.51
10 108.75 35.2 Azerbaijan 1.96 8960.0 71 NaN 8.24 9309 84.1 85.3 Europe 22.25
11 86.06 16.9 Bahamas 1.90 NaN 75 NaN 11.24 372 NaN NaN Americas 21.62
14 127.01 18.4 Barbados 1.84 NaN 78 NaN 15.78 283 NaN NaN Americas 18.99
15 111.88 5.2 Belarus 1.47 14460.0 71 NaN 19.31 9405 NaN NaN Europe 15.10
16 116.61 4.2 Belgium 1.85 39190.0 80 NaN 23.81 11060 99.2 98.9 Europe 16.88
17 69.96 18.3 Belize 2.76 6090.0 74 NaN 5.74 324 NaN NaN Americas 34.40
18 85.33 89.5 Benin 5.01 1620.0 57 42.4 4.54 10051 NaN NaN Africa 42.95
20 82.82 41.4 Bolivia (Plurinational State of) 3.31 4890.0 67 NaN 7.28 10496 91.5 91.2 Americas 35.23
21 84.52 6.7 Bosnia and Herzegovina 1.26 9190.0 76 97.9 20.52 3834 88.4 86.5 Europe 16.35
22 142.82 53.3 Botswana 2.71 14550.0 66 84.5 5.63 2004 NaN NaN Africa 33.75
23 124.26 14.4 Brazil 1.82 11420.0 74 NaN 10.81 199000 NaN NaN Americas 24.56
25 140.68 12.1 Bulgaria 1.51 14160.0 74 NaN 26.11 7278 99.7 99.3 Europe 13.53
26 45.27 102.4 Burkina Faso 5.78 1300.0 56 NaN 3.88 16460 55.9 60.7 Africa 45.66
27 22.33 104.3 Burundi 6.21 610.0 53 67.2 3.87 9850 NaN NaN Africa 44.20
29 52.35 94.9 Cameroon 4.94 2330.0 53 NaN 4.89 21700 87.4 99.6 Africa 43.08
30 79.73 5.3 Canada 1.66 39660.0 82 NaN 20.82 34838 NaN NaN Americas 16.37
31 79.19 22.2 Cape Verde 2.38 3980.0 72 84.3 7.05 494 92.4 94.6 Africa 30.17
32 40.65 128.6 Central African Republic 4.54 810.0 48 56.0 5.74 4525 60.6 81.3 Africa 40.07
33 31.80 149.8 Chad 6.49 1360.0 51 34.5 3.80 12448 NaN NaN Africa 48.52
34 129.71 9.1 Chile 1.84 16330.0 79 NaN 13.80 17465 94.4 94.3 Americas 21.38
36 98.45 17.6 Colombia 2.35 9560.0 78 93.4 9.19 47704 91.3 91.7 Americas 28.03
37 28.71 77.6 Comoros 4.85 1110.0 62 74.9 4.50 718 NaN NaN Africa 42.17
... ... ... ... ... ... ... ... ... ... ... ... ... ...
147 111.75 3.3 San Marino NaN NaN 83 NaN 26.97 31 NaN NaN Europe 14.04
148 68.26 53.2 Sao Tome and Principe 4.22 2080.0 63 89.2 4.76 188 NaN NaN Africa 41.60
150 73.25 59.6 Senegal 5.02 1940.0 61 NaN 4.57 13726 80.2 75.9 Africa 43.54
151 125.39 6.6 Serbia 1.37 11540.0 74 97.9 20.52 9553 94.4 94.7 Europe 16.45
152 145.71 13.1 Seychelles 2.23 25140.0 74 91.8 10.05 92 NaN NaN Africa 21.95
153 35.63 181.6 Sierra Leone 4.86 840.0 47 42.1 4.41 5979 NaN NaN Africa 41.74
155 109.35 7.5 Slovakia 1.37 22130.0 76 NaN 18.60 5446 NaN NaN Europe 15.00
156 106.56 3.1 Slovenia 1.49 26510.0 80 99.7 23.16 2068 97.3 97.7 Europe 14.16
159 126.83 44.6 South Africa 2.44 10710.0 58 NaN 8.44 52386 NaN NaN Africa 29.53
161 113.22 4.5 Spain 1.47 31400.0 82 97.7 22.86 46755 99.8 99.7 Europe 15.20
164 178.88 20.8 Suriname 2.32 NaN 72 94.7 9.55 535 NaN NaN Americas 27.83
165 63.70 79.7 Swaziland 3.48 5930.0 50 87.4 5.34 1231 NaN NaN Africa 38.05
166 118.57 2.9 Sweden 1.93 42200.0 82 NaN 25.32 9511 99.0 99.7 Europe 16.71
167 131.43 4.3 Switzerland 1.51 52570.0 83 NaN 23.25 7997 99.5 98.9 Europe 14.79
169 90.64 58.3 Tajikistan 3.81 2300.0 68 99.7 4.80 8009 96.0 99.5 Europe 35.75
171 107.24 7.4 The former Yugoslav Republic of Macedonia 1.44 11090.0 75 97.3 17.56 2106 99.2 97.3 Europe 16.89
173 50.45 95.5 Togo 4.75 1040.0 56 NaN 4.44 6643 NaN NaN Africa 41.89
175 135.64 20.7 Trinidad and Tobago 1.80 NaN 71 98.8 13.18 1337 97.0 97.7 Americas 20.73
177 88.70 14.2 Turkey 2.08 16940.0 76 NaN 10.56 73997 98.3 99.5 Europe 26.00
178 68.77 52.8 Turkmenistan 2.38 8690.0 63 99.6 6.30 5173 NaN NaN Europe 28.65
180 48.38 68.9 Uganda 6.06 1310.0 56 73.2 3.72 36346 92.3 89.7 Africa 48.54
181 122.98 10.7 Ukraine 1.45 7040.0 71 99.7 20.76 45530 91.5 90.8 Europe 14.18
183 130.75 4.8 United Kingdom 1.90 36010.0 80 NaN 23.06 62783 99.6 99.8 Europe 17.54
184 55.53 54.0 United Republic of Tanzania 5.36 1500.0 59 73.2 4.89 47783 NaN NaN Africa 44.85
185 92.72 7.1 United States of America 2.00 48820.0 79 NaN 19.31 318000 96.1 95.4 Americas 19.63
186 140.75 7.2 Uruguay 2.07 14640.0 77 98.1 18.59 3395 NaN NaN Americas 22.05
187 91.65 39.6 Uzbekistan 2.38 3420.0 68 99.4 6.38 28541 91.0 93.3 Europe 28.90
189 97.78 15.3 Venezuela (Bolivarian Republic of) 2.44 12430.0 75 NaN 9.17 29955 95.1 94.7 Americas 28.84
192 60.59 88.5 Zambia 5.77 1490.0 55 71.2 3.95 14075 93.9 91.4 Africa 46.73
193 72.13 89.8 Zimbabwe 3.64 NaN 54 92.2 5.68 13724 NaN NaN Africa 40.24

134 rows × 13 columns

In [346]:
df_who[~df_who.Region.isin(df_who.Region.value_counts().nlargest(3).index)]
Out[346]:
CellularSubscribers ChildMortality Country FertilityRate GNI LifeExpectancy LiteracyRate Over60 Population PrimarySchoolEnrollmentFemale PrimarySchoolEnrollmentMale Region Under15
0 54.26 98.5 Afghanistan 5.40 1140.0 60 NaN 3.82 29825 NaN NaN Eastern Mediterranean 47.42
8 108.34 4.9 Australia 1.89 38110.0 82 NaN 19.46 23050 97.5 96.9 Western Pacific 18.95
12 127.96 9.6 Bahrain 2.12 NaN 79 91.9 3.38 1318 NaN NaN Eastern Mediterranean 20.16
13 56.06 40.9 Bangladesh 2.24 1940.0 70 56.8 6.89 155000 NaN NaN South-East Asia 30.57
19 65.58 44.6 Bhutan 2.32 5570.0 67 NaN 6.90 742 91.5 88.3 South-East Asia 28.53
24 109.17 8.0 Brunei Darussalam 2.03 NaN 77 95.2 7.03 412 NaN NaN Western Pacific 25.75
28 96.17 39.7 Cambodia 2.93 2230.0 65 NaN 7.67 14865 95.4 96.4 Western Pacific 31.23
35 73.19 14.0 China 1.66 8390.0 76 94.3 13.42 1390000 NaN NaN Western Pacific 17.95
39 NaN 10.6 Cook Islands NaN NaN 77 NaN 9.07 21 99.3 97.6 Western Pacific 30.61
46 4.09 28.8 Democratic People's Republic of Korea 2.00 NaN 69 NaN 12.74 24763 NaN NaN South-East Asia 21.98
49 21.32 80.9 Djibouti 3.53 NaN 58 NaN 5.96 860 NaN NaN Eastern Mediterranean 33.72
53 101.08 21.0 Egypt 2.85 6120.0 73 72.0 8.62 80722 NaN NaN Eastern Mediterranean 31.25
59 83.72 22.4 Fiji 2.64 4610.0 70 NaN 8.38 875 NaN NaN Western Pacific 28.88
77 72.00 56.3 India 2.53 3590.0 65 NaN 8.10 1240000 NaN NaN South-East Asia 29.43
78 103.09 31.0 Indonesia 2.40 4500.0 69 NaN 7.86 247000 NaN NaN South-East Asia 29.27
79 74.93 17.6 Iran (Islamic Republic of) 1.91 NaN 73 NaN 7.82 76424 NaN NaN Eastern Mediterranean 23.68
80 78.12 34.4 Iraq 4.15 3750.0 69 78.2 4.95 32778 NaN NaN Eastern Mediterranean 40.51
85 104.95 3.0 Japan 1.39 35330.0 83 NaN 31.92 127000 NaN NaN Western Pacific 13.12
86 118.20 19.1 Jordan 3.39 5930.0 74 92.6 5.30 7009 90.7 90.8 Eastern Mediterranean 34.13
89 13.64 59.9 Kiribati 3.01 3300.0 67 NaN 8.84 101 NaN NaN Western Pacific 30.10
90 175.09 11.0 Kuwait 2.65 NaN 80 NaN 3.80 3250 NaN NaN Eastern Mediterranean 24.90
92 87.16 71.8 Lao People's Democratic Republic 3.20 2580.0 68 NaN 5.76 6646 95.4 98.1 Western Pacific 35.61
94 78.65 9.3 Lebanon 1.50 14470.0 74 NaN 12.03 4647 92.9 93.5 Eastern Mediterranean 21.64
97 155.70 15.4 Libya 2.47 NaN 65 89.2 6.96 6155 NaN NaN Eastern Mediterranean 29.45
102 127.04 8.5 Malaysia 1.99 15650.0 74 93.1 8.21 29240 NaN NaN Western Pacific 26.65
103 165.72 10.5 Maldives 2.31 7430.0 77 NaN 6.65 338 96.5 96.5 South-East Asia 29.03
106 NaN 37.9 Marshall Islands NaN NaN 60 NaN 8.84 53 NaN NaN Western Pacific 30.10
110 NaN 38.5 Micronesia (Federated States of) 3.40 3580.0 69 NaN 6.67 103 NaN NaN Western Pacific 35.81
112 105.08 27.5 Mongolia 2.45 4290.0 68 97.4 5.80 2796 98.5 99.6 Western Pacific 27.05
114 113.26 31.1 Morocco 2.65 4880.0 72 NaN 7.61 32521 NaN NaN Eastern Mediterranean 27.85
116 2.57 52.3 Myanmar 1.98 NaN 65 92.3 8.15 52797 NaN NaN South-East Asia 25.28
118 65.00 37.1 Nauru NaN NaN 71 NaN 8.84 10 NaN NaN Western Pacific 30.10
119 43.81 41.6 Nepal 2.50 1260.0 68 60.3 7.65 27474 NaN NaN South-East Asia 35.58
121 109.19 5.7 New Zealand 2.10 NaN 81 NaN 19.01 4460 99.6 99.3 Western Pacific 20.26
125 NaN 25.1 Niue NaN NaN 72 NaN 9.07 1 NaN NaN Western Pacific 30.61
127 168.97 11.6 Oman 2.90 NaN 72 NaN 3.99 3314 NaN NaN Eastern Mediterranean 24.19
128 61.61 85.9 Pakistan 3.35 2870.0 67 NaN 6.44 179000 66.5 81.3 Eastern Mediterranean 34.31
129 74.94 20.8 Palau NaN 11080.0 72 NaN 8.84 21 NaN NaN Western Pacific 30.10
131 34.22 63.0 Papua New Guinea 3.90 2570.0 63 60.6 4.79 7167 NaN NaN Western Pacific 38.37
134 99.30 29.8 Philippines 3.11 4140.0 69 NaN 6.21 96707 NaN NaN Western Pacific 34.53
137 123.11 7.4 Qatar 2.06 86440.0 82 96.3 1.73 2051 96.6 95.7 Eastern Mediterranean 13.28
138 108.50 3.8 Republic of Korea 1.29 30370.0 81 NaN 16.58 49003 98.4 99.3 Western Pacific 15.25
146 NaN 17.8 Samoa 4.28 4270.0 73 98.8 7.39 189 97.1 93.2 Western Pacific 37.88
149 191.24 8.6 Saudi Arabia 2.76 24700.0 76 86.6 4.59 28288 96.5 96.7 Eastern Mediterranean 29.69
154 150.24 2.9 Singapore 1.27 59380.0 82 95.9 15.13 5303 NaN NaN Western Pacific 16.48
157 49.77 31.1 Solomon Islands 4.17 2350.0 70 NaN 5.10 550 87.3 87.7 Western Pacific 40.37
158 6.85 147.4 Somalia 6.77 NaN 50 NaN 4.46 10195 NaN NaN Eastern Mediterranean 47.35
160 NaN 104.0 South Sudan 5.10 NaN 54 NaN 5.26 10838 NaN NaN Eastern Mediterranean 42.28
162 87.05 9.6 Sri Lanka 2.35 5520.0 75 91.2 12.40 21098 94.4 93.9 South-East Asia 25.15
163 56.14 73.1 Sudan 4.56 2120.0 62 71.1 4.99 37195 NaN NaN Eastern Mediterranean 41.48
168 63.17 15.1 Syrian Arab Republic 3.04 NaN 75 83.4 6.09 21890 NaN NaN Eastern Mediterranean 35.35
170 111.63 13.2 Thailand 1.43 8360.0 74 NaN 13.96 66785 NaN NaN South-East Asia 18.47
172 53.23 56.7 Timor-Leste 6.11 NaN 64 58.3 5.16 1114 85.6 86.2 South-East Asia 46.33
174 52.63 12.8 Tonga 3.86 5000.0 72 NaN 7.96 105 NaN NaN Western Pacific 37.33
176 116.93 16.1 Tunisia 2.04 9030.0 76 NaN 10.49 10875 NaN NaN Eastern Mediterranean 23.22
179 21.63 29.7 Tuvalu NaN NaN 64 NaN 9.07 10 NaN NaN Western Pacific 30.61
182 148.62 8.4 United Arab Emirates 1.84 47890.0 76 NaN 0.81 9206 NaN NaN Eastern Mediterranean 14.41
188 55.76 17.9 Vanuatu 3.46 4330.0 72 82.6 6.02 247 NaN NaN Western Pacific 37.37
190 143.39 23.0 Viet Nam 1.79 3250.0 75 93.2 9.32 90796 NaN NaN Western Pacific 22.87
191 47.05 60.0 Yemen 4.35 2170.0 64 63.9 4.54 23852 70.5 85.5 Eastern Mediterranean 40.72
  • we will use nlargest to select the three rows having the largest values in column “Population”.
In [347]:
df_who.nlargest(3, 'Population')
Out[347]:
CellularSubscribers ChildMortality Country FertilityRate GNI LifeExpectancy LiteracyRate Over60 Population PrimarySchoolEnrollmentFemale PrimarySchoolEnrollmentMale Region Under15
35 73.19 14.0 China 1.66 8390.0 76 94.3 13.42 1390000 NaN NaN Western Pacific 17.95
77 72.00 56.3 India 2.53 3590.0 65 NaN 8.10 1240000 NaN NaN South-East Asia 29.43
185 92.72 7.1 United States of America 2.00 48820.0 79 NaN 19.31 318000 96.1 95.4 Americas 19.63
  • When using keep='last', ties are resolved in reverse order:
In [348]:
df_who.nlargest(3, 'LifeExpectancy', keep='last')
Out[348]:
CellularSubscribers ChildMortality Country FertilityRate GNI LifeExpectancy LiteracyRate Over60 Population PrimarySchoolEnrollmentFemale PrimarySchoolEnrollmentMale Region Under15
167 131.43 4.3 Switzerland 1.51 52570.0 83 NaN 23.25 7997 99.5 98.9 Europe 14.79
147 111.75 3.3 San Marino NaN NaN 83 NaN 26.97 31 NaN NaN Europe 14.04
85 104.95 3.0 Japan 1.39 35330.0 83 NaN 31.92 127000 NaN NaN Western Pacific 13.12
  • When using keep='all', all duplicate items are maintained:
In [349]:
df_who.nlargest(3, 'LifeExpectancy', keep='all')
Out[349]:
CellularSubscribers ChildMortality Country FertilityRate GNI LifeExpectancy LiteracyRate Over60 Population PrimarySchoolEnrollmentFemale PrimarySchoolEnrollmentMale Region Under15
85 104.95 3.0 Japan 1.39 35330.0 83 NaN 31.92 127000 NaN NaN Western Pacific 13.12
147 111.75 3.3 San Marino NaN NaN 83 NaN 26.97 31 NaN NaN Europe 14.04
167 131.43 4.3 Switzerland 1.51 52570.0 83 NaN 23.25 7997 99.5 98.9 Europe 14.79

nsmallest()

  • More Information
  • Get the rows of a DataFrame sorted by the n smallest values of columns.
In [350]:
df_who.Region.value_counts().nsmallest(3)
Out[350]:
South-East Asia          11
Eastern Mediterranean    22
Western Pacific          27
Name: Region, dtype: int64
  • And all we actually need from this Series is the index:
In [351]:
df_who.Region.value_counts().nsmallest(3).index
Out[351]:
Index(['South-East Asia', 'Eastern Mediterranean', 'Western Pacific'], dtype='object')
In [352]:
df_who[df_who.Region.isin(df_who.Region.value_counts().nsmallest(3).index)]
Out[352]:
CellularSubscribers ChildMortality Country FertilityRate GNI LifeExpectancy LiteracyRate Over60 Population PrimarySchoolEnrollmentFemale PrimarySchoolEnrollmentMale Region Under15
0 54.26 98.5 Afghanistan 5.40 1140.0 60 NaN 3.82 29825 NaN NaN Eastern Mediterranean 47.42
8 108.34 4.9 Australia 1.89 38110.0 82 NaN 19.46 23050 97.5 96.9 Western Pacific 18.95
12 127.96 9.6 Bahrain 2.12 NaN 79 91.9 3.38 1318 NaN NaN Eastern Mediterranean 20.16
13 56.06 40.9 Bangladesh 2.24 1940.0 70 56.8 6.89 155000 NaN NaN South-East Asia 30.57
19 65.58 44.6 Bhutan 2.32 5570.0 67 NaN 6.90 742 91.5 88.3 South-East Asia 28.53
24 109.17 8.0 Brunei Darussalam 2.03 NaN 77 95.2 7.03 412 NaN NaN Western Pacific 25.75
28 96.17 39.7 Cambodia 2.93 2230.0 65 NaN 7.67 14865 95.4 96.4 Western Pacific 31.23
35 73.19 14.0 China 1.66 8390.0 76 94.3 13.42 1390000 NaN NaN Western Pacific 17.95
39 NaN 10.6 Cook Islands NaN NaN 77 NaN 9.07 21 99.3 97.6 Western Pacific 30.61
46 4.09 28.8 Democratic People's Republic of Korea 2.00 NaN 69 NaN 12.74 24763 NaN NaN South-East Asia 21.98
49 21.32 80.9 Djibouti 3.53 NaN 58 NaN 5.96 860 NaN NaN Eastern Mediterranean 33.72
53 101.08 21.0 Egypt 2.85 6120.0 73 72.0 8.62 80722 NaN NaN Eastern Mediterranean 31.25
59 83.72 22.4 Fiji 2.64 4610.0 70 NaN 8.38 875 NaN NaN Western Pacific 28.88
77 72.00 56.3 India 2.53 3590.0 65 NaN 8.10 1240000 NaN NaN South-East Asia 29.43
78 103.09 31.0 Indonesia 2.40 4500.0 69 NaN 7.86 247000 NaN NaN South-East Asia 29.27
79 74.93 17.6 Iran (Islamic Republic of) 1.91 NaN 73 NaN 7.82 76424 NaN NaN Eastern Mediterranean 23.68
80 78.12 34.4 Iraq 4.15 3750.0 69 78.2 4.95 32778 NaN NaN Eastern Mediterranean 40.51
85 104.95 3.0 Japan 1.39 35330.0 83 NaN 31.92 127000 NaN NaN Western Pacific 13.12
86 118.20 19.1 Jordan 3.39 5930.0 74 92.6 5.30 7009 90.7 90.8 Eastern Mediterranean 34.13
89 13.64 59.9 Kiribati 3.01 3300.0 67 NaN 8.84 101 NaN NaN Western Pacific 30.10
90 175.09 11.0 Kuwait 2.65 NaN 80 NaN 3.80 3250 NaN NaN Eastern Mediterranean 24.90
92 87.16 71.8 Lao People's Democratic Republic 3.20 2580.0 68 NaN 5.76 6646 95.4 98.1 Western Pacific 35.61
94 78.65 9.3 Lebanon 1.50 14470.0 74 NaN 12.03 4647 92.9 93.5 Eastern Mediterranean 21.64
97 155.70 15.4 Libya 2.47 NaN 65 89.2 6.96 6155 NaN NaN Eastern Mediterranean 29.45
102 127.04 8.5 Malaysia 1.99 15650.0 74 93.1 8.21 29240 NaN NaN Western Pacific 26.65
103 165.72 10.5 Maldives 2.31 7430.0 77 NaN 6.65 338 96.5 96.5 South-East Asia 29.03
106 NaN 37.9 Marshall Islands NaN NaN 60 NaN 8.84 53 NaN NaN Western Pacific 30.10
110 NaN 38.5 Micronesia (Federated States of) 3.40 3580.0 69 NaN 6.67 103 NaN NaN Western Pacific 35.81
112 105.08 27.5 Mongolia 2.45 4290.0 68 97.4 5.80 2796 98.5 99.6 Western Pacific 27.05
114 113.26 31.1 Morocco 2.65 4880.0 72 NaN 7.61 32521 NaN NaN Eastern Mediterranean 27.85
116 2.57 52.3 Myanmar 1.98 NaN 65 92.3 8.15 52797 NaN NaN South-East Asia 25.28
118 65.00 37.1 Nauru NaN NaN 71 NaN 8.84 10 NaN NaN Western Pacific 30.10
119 43.81 41.6 Nepal 2.50 1260.0 68 60.3 7.65 27474 NaN NaN South-East Asia 35.58
121 109.19 5.7 New Zealand 2.10 NaN 81 NaN 19.01 4460 99.6 99.3 Western Pacific 20.26
125 NaN 25.1 Niue NaN NaN 72 NaN 9.07 1 NaN NaN Western Pacific 30.61
127 168.97 11.6 Oman 2.90 NaN 72 NaN 3.99 3314 NaN NaN Eastern Mediterranean 24.19
128 61.61 85.9 Pakistan 3.35 2870.0 67 NaN 6.44 179000 66.5 81.3 Eastern Mediterranean 34.31
129 74.94 20.8 Palau NaN 11080.0 72 NaN 8.84 21 NaN NaN Western Pacific 30.10
131 34.22 63.0 Papua New Guinea 3.90 2570.0 63 60.6 4.79 7167 NaN NaN Western Pacific 38.37
134 99.30 29.8 Philippines 3.11 4140.0 69 NaN 6.21 96707 NaN NaN Western Pacific 34.53
137 123.11 7.4 Qatar 2.06 86440.0 82 96.3 1.73 2051 96.6 95.7 Eastern Mediterranean 13.28
138 108.50 3.8 Republic of Korea 1.29 30370.0 81 NaN 16.58 49003 98.4 99.3 Western Pacific 15.25
146 NaN 17.8 Samoa 4.28 4270.0 73 98.8 7.39 189 97.1 93.2 Western Pacific 37.88
149 191.24 8.6 Saudi Arabia 2.76 24700.0 76 86.6 4.59 28288 96.5 96.7 Eastern Mediterranean 29.69
154 150.24 2.9 Singapore 1.27 59380.0 82 95.9 15.13 5303 NaN NaN Western Pacific 16.48
157 49.77 31.1 Solomon Islands 4.17 2350.0 70 NaN 5.10 550 87.3 87.7 Western Pacific 40.37
158 6.85 147.4 Somalia 6.77 NaN 50 NaN 4.46 10195 NaN NaN Eastern Mediterranean 47.35
160 NaN 104.0 South Sudan 5.10 NaN 54 NaN 5.26 10838 NaN NaN Eastern Mediterranean 42.28
162 87.05 9.6 Sri Lanka 2.35 5520.0 75 91.2 12.40 21098 94.4 93.9 South-East Asia 25.15
163 56.14 73.1 Sudan 4.56 2120.0 62 71.1 4.99 37195 NaN NaN Eastern Mediterranean 41.48
168 63.17 15.1 Syrian Arab Republic 3.04 NaN 75 83.4 6.09 21890 NaN NaN Eastern Mediterranean 35.35
170 111.63 13.2 Thailand 1.43 8360.0 74 NaN 13.96 66785 NaN NaN South-East Asia 18.47
172 53.23 56.7 Timor-Leste 6.11 NaN 64 58.3 5.16 1114 85.6 86.2 South-East Asia 46.33
174 52.63 12.8 Tonga 3.86 5000.0 72 NaN 7.96 105 NaN NaN Western Pacific 37.33
176 116.93 16.1 Tunisia 2.04 9030.0 76 NaN 10.49 10875 NaN NaN Eastern Mediterranean 23.22
179 21.63 29.7 Tuvalu NaN NaN 64 NaN 9.07 10 NaN NaN Western Pacific 30.61
182 148.62 8.4 United Arab Emirates 1.84 47890.0 76 NaN 0.81 9206 NaN NaN Eastern Mediterranean 14.41
188 55.76 17.9 Vanuatu 3.46 4330.0 72 82.6 6.02 247 NaN NaN Western Pacific 37.37
190 143.39 23.0 Viet Nam 1.79 3250.0 75 93.2 9.32 90796 NaN NaN Western Pacific 22.87
191 47.05 60.0 Yemen 4.35 2170.0 64 63.9 4.54 23852 70.5 85.5 Eastern Mediterranean 40.72
In [353]:
df_who[~df_who.Region.isin(df_who.Region.value_counts().nsmallest(3).index)]
Out[353]:
CellularSubscribers ChildMortality Country FertilityRate GNI LifeExpectancy LiteracyRate Over60 Population PrimarySchoolEnrollmentFemale PrimarySchoolEnrollmentMale Region Under15
1 96.39 16.7 Albania 1.75 8820.0 74 NaN 14.93 3162 NaN NaN Europe 21.33
2 98.99 20.0 Algeria 2.83 8310.0 73 NaN 7.17 38482 96.4 98.2 Africa 27.42
3 75.49 3.2 Andorra NaN NaN 82 NaN 22.86 78 79.4 78.4 Europe 15.20
4 48.38 163.5 Angola 6.10 5230.0 51 70.1 3.84 20821 78.2 93.1 Africa 47.58
5 196.41 9.9 Antigua and Barbuda 2.12 17900.0 75 99.0 12.35 89 84.5 91.1 Americas 25.96
6 134.92 14.2 Argentina 2.20 17130.0 76 97.8 14.97 41087 NaN NaN Americas 24.42
7 103.57 16.4 Armenia 1.74 6100.0 71 99.6 14.06 2969 NaN NaN Europe 20.34
9 154.78 4.0 Austria 1.44 42050.0 81 NaN 23.52 8464 NaN NaN Europe 14.51
10 108.75 35.2 Azerbaijan 1.96 8960.0 71 NaN 8.24 9309 84.1 85.3 Europe 22.25
11 86.06 16.9 Bahamas 1.90 NaN 75 NaN 11.24 372 NaN NaN Americas 21.62
14 127.01 18.4 Barbados 1.84 NaN 78 NaN 15.78 283 NaN NaN Americas 18.99
15 111.88 5.2 Belarus 1.47 14460.0 71 NaN 19.31 9405 NaN NaN Europe 15.10
16 116.61 4.2 Belgium 1.85 39190.0 80 NaN 23.81 11060 99.2 98.9 Europe 16.88
17 69.96 18.3 Belize 2.76 6090.0 74 NaN 5.74 324 NaN NaN Americas 34.40
18 85.33 89.5 Benin 5.01 1620.0 57 42.4 4.54 10051 NaN NaN Africa 42.95
20 82.82 41.4 Bolivia (Plurinational State of) 3.31 4890.0 67 NaN 7.28 10496 91.5 91.2 Americas 35.23
21 84.52 6.7 Bosnia and Herzegovina 1.26 9190.0 76 97.9 20.52 3834 88.4 86.5 Europe 16.35
22 142.82 53.3 Botswana 2.71 14550.0 66 84.5 5.63 2004 NaN NaN Africa 33.75
23 124.26 14.4 Brazil 1.82 11420.0 74 NaN 10.81 199000 NaN NaN Americas 24.56
25 140.68 12.1 Bulgaria 1.51 14160.0 74 NaN 26.11 7278 99.7 99.3 Europe 13.53
26 45.27 102.4 Burkina Faso 5.78 1300.0 56 NaN 3.88 16460 55.9 60.7 Africa 45.66
27 22.33 104.3 Burundi 6.21 610.0 53 67.2 3.87 9850 NaN NaN Africa 44.20
29 52.35 94.9 Cameroon 4.94 2330.0 53 NaN 4.89 21700 87.4 99.6 Africa 43.08
30 79.73 5.3 Canada 1.66 39660.0 82 NaN 20.82 34838 NaN NaN Americas 16.37
31 79.19 22.2 Cape Verde 2.38 3980.0 72 84.3 7.05 494 92.4 94.6 Africa 30.17
32 40.65 128.6 Central African Republic 4.54 810.0 48 56.0 5.74 4525 60.6 81.3 Africa 40.07
33 31.80 149.8 Chad 6.49 1360.0 51 34.5 3.80 12448 NaN NaN Africa 48.52
34 129.71 9.1 Chile 1.84 16330.0 79 NaN 13.80 17465 94.4 94.3 Americas 21.38
36 98.45 17.6 Colombia 2.35 9560.0 78 93.4 9.19 47704 91.3 91.7 Americas 28.03
37 28.71 77.6 Comoros 4.85 1110.0 62 74.9 4.50 718 NaN NaN Africa 42.17
... ... ... ... ... ... ... ... ... ... ... ... ... ...
147 111.75 3.3 San Marino NaN NaN 83 NaN 26.97 31 NaN NaN Europe 14.04
148 68.26 53.2 Sao Tome and Principe 4.22 2080.0 63 89.2 4.76 188 NaN NaN Africa 41.60
150 73.25 59.6 Senegal 5.02 1940.0 61 NaN 4.57 13726 80.2 75.9 Africa 43.54
151 125.39 6.6 Serbia 1.37 11540.0 74 97.9 20.52 9553 94.4 94.7 Europe 16.45
152 145.71 13.1 Seychelles 2.23 25140.0 74 91.8 10.05 92 NaN NaN Africa 21.95
153 35.63 181.6 Sierra Leone 4.86 840.0 47 42.1 4.41 5979 NaN NaN Africa 41.74
155 109.35 7.5 Slovakia 1.37 22130.0 76 NaN 18.60 5446 NaN NaN Europe 15.00
156 106.56 3.1 Slovenia 1.49 26510.0 80 99.7 23.16 2068 97.3 97.7 Europe 14.16
159 126.83 44.6 South Africa 2.44 10710.0 58 NaN 8.44 52386 NaN NaN Africa 29.53
161 113.22 4.5 Spain 1.47 31400.0 82 97.7 22.86 46755 99.8 99.7 Europe 15.20
164 178.88 20.8 Suriname 2.32 NaN 72 94.7 9.55 535 NaN NaN Americas 27.83
165 63.70 79.7 Swaziland 3.48 5930.0 50 87.4 5.34 1231 NaN NaN Africa 38.05
166 118.57 2.9 Sweden 1.93 42200.0 82 NaN 25.32 9511 99.0 99.7 Europe 16.71
167 131.43 4.3 Switzerland 1.51 52570.0 83 NaN 23.25 7997 99.5 98.9 Europe 14.79
169 90.64 58.3 Tajikistan 3.81 2300.0 68 99.7 4.80 8009 96.0 99.5 Europe 35.75
171 107.24 7.4 The former Yugoslav Republic of Macedonia 1.44 11090.0 75 97.3 17.56 2106 99.2 97.3 Europe 16.89
173 50.45 95.5 Togo 4.75 1040.0 56 NaN 4.44 6643 NaN NaN Africa 41.89
175 135.64 20.7 Trinidad and Tobago 1.80 NaN 71 98.8 13.18 1337 97.0 97.7 Americas 20.73
177 88.70 14.2 Turkey 2.08 16940.0 76 NaN 10.56 73997 98.3 99.5 Europe 26.00
178 68.77 52.8 Turkmenistan 2.38 8690.0 63 99.6 6.30 5173 NaN NaN Europe 28.65
180 48.38 68.9 Uganda 6.06 1310.0 56 73.2 3.72 36346 92.3 89.7 Africa 48.54
181 122.98 10.7 Ukraine 1.45 7040.0 71 99.7 20.76 45530 91.5 90.8 Europe 14.18
183 130.75 4.8 United Kingdom 1.90 36010.0 80 NaN 23.06 62783 99.6 99.8 Europe 17.54
184 55.53 54.0 United Republic of Tanzania 5.36 1500.0 59 73.2 4.89 47783 NaN NaN Africa 44.85
185 92.72 7.1 United States of America 2.00 48820.0 79 NaN 19.31 318000 96.1 95.4 Americas 19.63
186 140.75 7.2 Uruguay 2.07 14640.0 77 98.1 18.59 3395 NaN NaN Americas 22.05
187 91.65 39.6 Uzbekistan 2.38 3420.0 68 99.4 6.38 28541 91.0 93.3 Europe 28.90
189 97.78 15.3 Venezuela (Bolivarian Republic of) 2.44 12430.0 75 NaN 9.17 29955 95.1 94.7 Americas 28.84
192 60.59 88.5 Zambia 5.77 1490.0 55 71.2 3.95 14075 93.9 91.4 Africa 46.73
193 72.13 89.8 Zimbabwe 3.64 NaN 54 92.2 5.68 13724 NaN NaN Africa 40.24

134 rows × 13 columns

In [354]:
df_who.nsmallest(3, 'Population')
Out[354]:
CellularSubscribers ChildMortality Country FertilityRate GNI LifeExpectancy LiteracyRate Over60 Population PrimarySchoolEnrollmentFemale PrimarySchoolEnrollmentMale Region Under15
125 NaN 25.1 Niue NaN NaN 72 NaN 9.07 1 NaN NaN Western Pacific 30.61
118 65.00 37.1 Nauru NaN NaN 71 NaN 8.84 10 NaN NaN Western Pacific 30.10
179 21.63 29.7 Tuvalu NaN NaN 64 NaN 9.07 10 NaN NaN Western Pacific 30.61

between

In [355]:
df_who= pd.read_csv('WHO_csv.csv')
In [356]:
df_who[df_who['Population'].between(1, 500)]
Out[356]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
17 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
24 Brunei Darussalam Western Pacific 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
31 Cape Verde Africa 494 30.17 7.05 2.38 72 22.2 79.19 84.3 3980.0 94.6 92.4
39 Cook Islands Western Pacific 21 30.61 9.07 NaN 77 10.6 NaN NaN NaN 97.6 99.3
50 Dominica Americas 72 25.96 12.35 NaN 74 12.6 164.02 NaN 13000.0 NaN NaN
68 Grenada Americas 105 26.96 9.72 2.22 74 13.5 NaN NaN 10350.0 NaN NaN
76 Iceland Europe 326 20.71 17.62 2.11 82 2.3 106.08 NaN 31020.0 98.8 99.2
89 Kiribati Western Pacific 101 30.10 8.84 3.01 67 59.9 13.64 NaN 3300.0 NaN NaN
103 Maldives South-East Asia 338 29.03 6.65 2.31 77 10.5 165.72 NaN 7430.0 96.5 96.5
105 Malta Europe 428 14.98 22.87 1.37 80 6.8 124.86 NaN NaN 93.3 94.3
106 Marshall Islands Western Pacific 53 30.10 8.84 NaN 60 37.9 NaN NaN NaN NaN NaN
110 Micronesia (Federated States of) Western Pacific 103 35.81 6.67 3.40 69 38.5 NaN NaN 3580.0 NaN NaN
111 Monaco Europe 38 18.26 23.82 NaN 82 3.8 89.73 NaN NaN NaN NaN
118 Nauru Western Pacific 10 30.10 8.84 NaN 71 37.1 65.00 NaN NaN NaN NaN
125 Niue Western Pacific 1 30.61 9.07 NaN 72 25.1 NaN NaN NaN NaN NaN
129 Palau Western Pacific 21 30.10 8.84 NaN 72 20.8 74.94 NaN 11080.0 NaN NaN
143 Saint Kitts and Nevis Americas 54 25.96 12.35 NaN 74 9.2 NaN NaN 16470.0 85.8 86.2
144 Saint Lucia Americas 181 24.31 12.13 1.96 75 17.5 123.00 NaN 11220.0 90.2 89.2
145 Saint Vincent and the Grenadines Americas 109 25.70 9.92 2.05 74 23.4 120.52 NaN 10440.0 NaN NaN
146 Samoa Western Pacific 189 37.88 7.39 4.28 73 17.8 NaN 98.8 4270.0 93.2 97.1
147 San Marino Europe 31 14.04 26.97 NaN 83 3.3 111.75 NaN NaN NaN NaN
148 Sao Tome and Principe Africa 188 41.60 4.76 4.22 63 53.2 68.26 89.2 2080.0 NaN NaN
152 Seychelles Africa 92 21.95 10.05 2.23 74 13.1 145.71 91.8 25140.0 NaN NaN
174 Tonga Western Pacific 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
179 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
188 Vanuatu Western Pacific 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN

11.19 Arithmetic Operations

In [357]:
df = pd.DataFrame({
     'a': [4, 5, 6, 7],
     'b': [10, 20, 30, 40],
     'c': [100, 50, -30, -50]
})
df
Out[357]:
a b c
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50

add

  • More Information
  • Addition of dataframe and other, element-wise (binary operator add).

Example-1

Method-1

In [358]:
df+1
Out[358]:
a b c
0 5 11 101
1 6 21 51
2 7 31 -29
3 8 41 -49

Method-2

In [359]:
df.add(1)
Out[359]:
a b c
0 5 11 101
1 6 21 51
2 7 31 -29
3 8 41 -49

Example-2

Method-1

In [360]:
df + [1, 1,1]
Out[360]:
a b c
0 5 11 101
1 6 21 51
2 7 31 -29
3 8 41 -49

Method-2

In [361]:
df.add([1,1,1])
Out[361]:
a b c
0 5 11 101
1 6 21 51
2 7 31 -29
3 8 41 -49

Exaple-3

In [362]:
df.add([1,2,3], axis='columns')
Out[362]:
a b c
0 5 12 103
1 6 22 53
2 7 32 -27
3 8 42 -47

Example-4 (add two data farme)

In [363]:
df1 = pd.DataFrame({
     'a': [4, 5, 6, 7],
     'b': [10, 20, 30, 40],
     'c': [100, 50, -30, -50]
})
df1
Out[363]:
a b c
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50
In [364]:
df2 = pd.DataFrame({
     'a': [1, 2, 3, 4],
     'b': [5, 6, 7, 8],
     'c': [9, 10, 11, 12]
})
df2
Out[364]:
a b c
0 1 5 9
1 2 6 10
2 3 7 11
3 4 8 12

Method-1

In [365]:
df1+df2
Out[365]:
a b c
0 5 15 109
1 7 26 60
2 9 37 -19
3 11 48 -38

Method-2

In [366]:
df1.add(df2)
Out[366]:
a b c
0 5 15 109
1 7 26 60
2 9 37 -19
3 11 48 -38

subtract

In [367]:
df = pd.DataFrame({
     'a': [4, 5, 6, 7],
     'b': [10, 20, 30, 40],
     'c': [100, 50, -30, -50]
})
df
Out[367]:
a b c
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50

Example-1

Method-1

In [368]:
df-1
Out[368]:
a b c
0 3 9 99
1 4 19 49
2 5 29 -31
3 6 39 -51

Method-2

In [369]:
df.sub(1)
Out[369]:
a b c
0 3 9 99
1 4 19 49
2 5 29 -31
3 6 39 -51

Method-3

In [370]:
df.subtract(1)
Out[370]:
a b c
0 3 9 99
1 4 19 49
2 5 29 -31
3 6 39 -51

Example-2

Mehtod-1

In [371]:
df - [1, 1,1]
Out[371]:
a b c
0 3 9 99
1 4 19 49
2 5 29 -31
3 6 39 -51

Mehtod-2

In [372]:
df.sub([1, 1,1])
Out[372]:
a b c
0 3 9 99
1 4 19 49
2 5 29 -31
3 6 39 -51

Method-3

In [373]:
df.subtract([1, 1,1])
Out[373]:
a b c
0 3 9 99
1 4 19 49
2 5 29 -31
3 6 39 -51

Example-3

Method-1

In [374]:
df.sub([1, 2,3], axis='columns')
Out[374]:
a b c
0 3 8 97
1 4 18 47
2 5 28 -33
3 6 38 -53

Method-2

In [375]:
df.subtract([1, 2,3], axis='columns')
Out[375]:
a b c
0 3 8 97
1 4 18 47
2 5 28 -33
3 6 38 -53

Example-4 (sub two data farme)

In [376]:
df1 = pd.DataFrame({
     'a': [4, 5, 6, 7],
     'b': [10, 20, 30, 40],
     'c': [100, 50, -30, -50]
})
df1
Out[376]:
a b c
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50
In [377]:
df2 = pd.DataFrame({
     'a': [1, 2, 3, 4],
     'b': [5, 6, 7, 8],
     'c': [9, 10, 11, 12]
})
df2
Out[377]:
a b c
0 1 5 9
1 2 6 10
2 3 7 11
3 4 8 12

Method-1

In [378]:
df1-df2
Out[378]:
a b c
0 3 5 91
1 3 14 40
2 3 23 -41
3 3 32 -62

Method-2

In [379]:
df1.sub(df2)
Out[379]:
a b c
0 3 5 91
1 3 14 40
2 3 23 -41
3 3 32 -62

Method-3

In [380]:
df1.subtract(df2)
Out[380]:
a b c
0 3 5 91
1 3 14 40
2 3 23 -41
3 3 32 -62

shift

  • Shift index by desired number of periods with an optional time freq
  • More Information

Example-1

In [381]:
df= pd.read_csv('egg_price.csv')
df
Out[381]:
Date Price
0 3-Dec-18 106
1 4-Dec-18 120
2 5-Dec-18 100
3 6-Dec-18 100
4 7-Dec-18 120
5 10-Dec-18 116
6 11-Dec-18 125
7 12-Dec-18 106
8 13-Dec-18 112
9 14-Dec-18 128
In [382]:
df= pd.read_csv('egg_price.csv',parse_dates=['Date'],index_col='Date')
df
Out[382]:
Price
Date
2018-12-03 106
2018-12-04 120
2018-12-05 100
2018-12-06 100
2018-12-07 120
2018-12-10 116
2018-12-11 125
2018-12-12 106
2018-12-13 112
2018-12-14 128
In [383]:
#shift the Price of eggs previous day
df['previous_day']=df['Price'].shift(1)
df
Out[383]:
Price previous_day
Date
2018-12-03 106 NaN
2018-12-04 120 106.0
2018-12-05 100 120.0
2018-12-06 100 100.0
2018-12-07 120 100.0
2018-12-10 116 120.0
2018-12-11 125 116.0
2018-12-12 106 125.0
2018-12-13 112 106.0
2018-12-14 128 112.0
In [384]:
#shift the Price of eggs next day
df['next_day']=df['Price'].shift(-1)
df
Out[384]:
Price previous_day next_day
Date
2018-12-03 106 NaN 120.0
2018-12-04 120 106.0 100.0
2018-12-05 100 120.0 100.0
2018-12-06 100 100.0 120.0
2018-12-07 120 100.0 116.0
2018-12-10 116 120.0 125.0
2018-12-11 125 116.0 106.0
2018-12-12 106 125.0 112.0
2018-12-13 112 106.0 128.0
2018-12-14 128 112.0 NaN
In [385]:
df['1daychange']=df['previous_day']-df['next_day']
df
Out[385]:
Price previous_day next_day 1daychange
Date
2018-12-03 106 NaN 120.0 NaN
2018-12-04 120 106.0 100.0 6.0
2018-12-05 100 120.0 100.0 20.0
2018-12-06 100 100.0 120.0 -20.0
2018-12-07 120 100.0 116.0 -16.0
2018-12-10 116 120.0 125.0 -5.0
2018-12-11 125 116.0 106.0 10.0
2018-12-12 106 125.0 112.0 13.0
2018-12-13 112 106.0 128.0 -22.0
2018-12-14 128 112.0 NaN NaN
In [386]:
df['5days%return']= (df['Price']-df['Price'].shift(5))*100/df['Price'].shift(5)
df
Out[386]:
Price previous_day next_day 1daychange 5days%return
Date
2018-12-03 106 NaN 120.0 NaN NaN
2018-12-04 120 106.0 100.0 6.0 NaN
2018-12-05 100 120.0 100.0 20.0 NaN
2018-12-06 100 100.0 120.0 -20.0 NaN
2018-12-07 120 100.0 116.0 -16.0 NaN
2018-12-10 116 120.0 125.0 -5.0 9.433962
2018-12-11 125 116.0 106.0 10.0 4.166667
2018-12-12 106 125.0 112.0 13.0 6.000000
2018-12-13 112 106.0 128.0 -22.0 12.000000
2018-12-14 128 112.0 NaN NaN 6.666667

diff

  • More Information
  • First discrete difference of element.
  • Calculates the difference of a DataFrame element compared with another element in the DataFrame (default is the element in the same column of the previous row).
In [387]:
df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
                    'b': [1, 1, 2, 3, 5, 8],
                    'c': [1, 4, 9, 16, 25, 36]})
df
Out[387]:
a b c
0 1 1 1
1 2 1 4
2 3 2 9
3 4 3 16
4 5 5 25
5 6 8 36
In [388]:
df.diff()
Out[388]:
a b c
0 NaN NaN NaN
1 1.0 0.0 3.0
2 1.0 1.0 5.0
3 1.0 1.0 7.0
4 1.0 2.0 9.0
5 1.0 3.0 11.0
  • Difference with previous column
In [389]:
df.diff(axis=1)
Out[389]:
a b c
0 NaN 0.0 0.0
1 NaN -1.0 3.0
2 NaN -1.0 7.0
3 NaN -1.0 13.0
4 NaN 0.0 20.0
5 NaN 2.0 28.0
  • Difference with 3rd previous row
In [390]:
df.diff(periods=3)
Out[390]:
a b c
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 3.0 2.0 15.0
4 3.0 4.0 21.0
5 3.0 6.0 27.0
  • Difference with following row
In [391]:
df.diff(periods=-1)
Out[391]:
a b c
0 -1.0 0.0 -3.0
1 -1.0 -1.0 -5.0
2 -1.0 -1.0 -7.0
3 -1.0 -2.0 -9.0
4 -1.0 -3.0 -11.0
5 NaN NaN NaN

multiply

In [392]:
df = pd.DataFrame({
     'a': [4, 5, 6, 7],
     'b': [10, 20, 30, 40],
     'c': [100, 50, -30, -50]
})
df
Out[392]:
a b c
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50

Example-1

Method-1

In [393]:
df*2
Out[393]:
a b c
0 8 20 200
1 10 40 100
2 12 60 -60
3 14 80 -100

Method-2

In [394]:
df.mul(2)
Out[394]:
a b c
0 8 20 200
1 10 40 100
2 12 60 -60
3 14 80 -100

Method-3

In [395]:
df.multiply(2)
Out[395]:
a b c
0 8 20 200
1 10 40 100
2 12 60 -60
3 14 80 -100

Example-2

Method-1

In [396]:
df*[1,2,3]
Out[396]:
a b c
0 4 20 300
1 5 40 150
2 6 60 -90
3 7 80 -150

Method-2

In [397]:
df.mul([1,2,3])
Out[397]:
a b c
0 4 20 300
1 5 40 150
2 6 60 -90
3 7 80 -150

Method-3

In [398]:
df.multiply([1,2,3])
Out[398]:
a b c
0 4 20 300
1 5 40 150
2 6 60 -90
3 7 80 -150

Example-3

Method-1

In [399]:
df = pd.DataFrame({
     'a': [4, 5, 6, 7],
     'b': [10, 20, 30, 40],
     'c': [100, 50, -30, -50]
})
df
Out[399]:
a b c
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50

Method-1

In [400]:
df.mul([1,2,3],axis='columns')
Out[400]:
a b c
0 4 20 300
1 5 40 150
2 6 60 -90
3 7 80 -150

Method-2

In [401]:
df.multiply([1,2,3],axis='columns')
Out[401]:
a b c
0 4 20 300
1 5 40 150
2 6 60 -90
3 7 80 -150

Example-4 (Multiply two data farme)

In [402]:
df1 = pd.DataFrame({
     'a': [4, 5, 6, 7],
     'b': [10, 20, 30, 40],
     'c': [100, 50, -30, -50]
})
df1
Out[402]:
a b c
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50
In [403]:
df2 = pd.DataFrame({
     'a': [1, 2, 3, 4],
     'b': [5, 6, 7, 8],
     'c': [9, 10, 11, 12]
})
df2
Out[403]:
a b c
0 1 5 9
1 2 6 10
2 3 7 11
3 4 8 12

Method-1

In [404]:
df1*df2
Out[404]:
a b c
0 4 50 900
1 10 120 500
2 18 210 -330
3 28 320 -600

Method-2

In [405]:
df1.mul(df2)
Out[405]:
a b c
0 4 50 900
1 10 120 500
2 18 210 -330
3 28 320 -600

Method-3

In [406]:
df1.multiply(df2)
Out[406]:
a b c
0 4 50 900
1 10 120 500
2 18 210 -330
3 28 320 -600

dot

  • Matrix multiplication with DataFrame or Series objects.

Example-1

In [407]:
df1 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 4]})
In [408]:
df1
Out[408]:
A B
0 1 3
1 2 4
In [409]:
df2 = pd.DataFrame({"A":[5, 6], 
                   "B":[7, 8]})
In [410]:
df2
Out[410]:
A B
0 5 7
1 6 8

np.dot(df1, df2)

Example-2

In [411]:
df1 = pd.Series([7, 5, 6, 4, 9])
df1
Out[411]:
0    7
1    5
2    6
3    4
4    9
dtype: int64
In [412]:
df2= pd.Series([1, 2, 3, 10, 2]) 
df2
Out[412]:
0     1
1     2
2     3
3    10
4     2
dtype: int64

(7)(1) + (5)(2) + (6)(3) + (4)(10)+ (9)(2) = 7 + 10 + 18 + 40 + 18 = 93

In [413]:
df1.dot(df2)
Out[413]:
93

product

In [414]:
df = pd.DataFrame({"A":[5, 6], 
                   "B":[7, 8]})
df
Out[414]:
A B
0 5 7
1 6 8

Example-1

Method-1

In [415]:
df.prod(axis = 0)
Out[415]:
A    30
B    56
dtype: int64
In [416]:
df.prod(axis = 1)
Out[416]:
0    35
1    48
dtype: int64

Method-2

In [417]:
df.product(axis = 0)
Out[417]:
A    30
B    56
dtype: int64
In [418]:
df.product(axis = 1)
Out[418]:
0    35
1    48
dtype: int64

divide

Example-1

Method-1

In [419]:
df = pd.DataFrame({"A":[2, 4], 
                   "B":[6, 8]})
df
Out[419]:
A B
0 2 6
1 4 8
In [420]:
df/2
Out[420]:
A B
0 1.0 3.0
1 2.0 4.0

Method-2

In [421]:
df.div(2)
Out[421]:
A B
0 1.0 3.0
1 2.0 4.0

Method-3

In [422]:
df.divide(2)
Out[422]:
A B
0 1.0 3.0
1 2.0 4.0

Method-4

In [423]:
df.rdiv(2) 
#2/df
Out[423]:
A B
0 1.0 0.333333
1 0.5 0.250000

Example-2

In [424]:
df = pd.DataFrame({"A":[2, 4], 
                   "B":[6, 8]})
df
Out[424]:
A B
0 2 6
1 4 8

Method-1

In [425]:
df/[2,4]
Out[425]:
A B
0 1.0 1.5
1 2.0 2.0

Method-2

In [426]:
df.div([2,4])
Out[426]:
A B
0 1.0 1.5
1 2.0 2.0

Method-3

In [427]:
df.divide([2,4])
Out[427]:
A B
0 1.0 1.5
1 2.0 2.0

Method-4

In [428]:
df.rdiv([2,4])
Out[428]:
A B
0 1.0 0.666667
1 0.5 0.500000

Example-3

In [429]:
df = pd.DataFrame({"A":[2, 4], 
                   "B":[6, 8]})
df
Out[429]:
A B
0 2 6
1 4 8

Method-1

In [430]:
df.div([2,4],axis='columns')
Out[430]:
A B
0 1.0 1.5
1 2.0 2.0

Method-2

In [431]:
df.divide([2,4],axis='columns')
Out[431]:
A B
0 1.0 1.5
1 2.0 2.0

Method-3

In [432]:
df.rdiv([2,4],axis='columns')
Out[432]:
A B
0 1.0 0.666667
1 0.5 0.500000

Example-4 (div Two Data Frames)

In [433]:
df1 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 4]})
df1
Out[433]:
A B
0 1 3
1 2 4
In [434]:
df2 = pd.DataFrame({"A":[5, 6], 
                   "B":[7, 8]})
df2
Out[434]:
A B
0 5 7
1 6 8

Method-1

In [435]:
df1/df2
Out[435]:
A B
0 0.200000 0.428571
1 0.333333 0.500000

Method-2

In [436]:
df1.div(df2)
Out[436]:
A B
0 0.200000 0.428571
1 0.333333 0.500000

Method-3

In [437]:
df1.divide(df2)
Out[437]:
A B
0 0.200000 0.428571
1 0.333333 0.500000

Method-4

In [438]:
df1.rdiv(df2)
Out[438]:
A B
0 5.0 2.333333
1 3.0 2.000000

floordiv

  • Integer division of dataframe and other, element-wise (binary operator floordiv).
In [439]:
df = pd.DataFrame({"A":[5, 3, None, 4], 
                   "B":[None, 2, 4, 3],  
                   "C":[4, 3, 8, 5], 
                   "D":[5, 4, 2, None]})
df
Out[439]:
A B C D
0 5.0 NaN 4 5.0
1 3.0 2.0 3 4.0
2 NaN 4.0 8 2.0
3 4.0 3.0 5 NaN
In [440]:
df.floordiv(2, fill_value = 50) 
Out[440]:
A B C D
0 2.0 25.0 2 2.0
1 1.0 1.0 1 2.0
2 25.0 2.0 4 1.0
3 2.0 1.0 2 25.0

mod

  • More Information
  • Modulo of dataframe and other, element-wise (binary operator mod).
In [441]:
df= pd.DataFrame({"A":[5, 3, None, 4], 
                   "B":[None, 2, 4, 3],  
                   "C":[4, 3, 7, 5], 
                   "D":[5, 4, 2, None]})
df
Out[441]:
A B C D
0 5.0 NaN 4 5.0
1 3.0 2.0 3 4.0
2 NaN 4.0 7 2.0
3 4.0 3.0 5 NaN
In [442]:
df.mod(3)
Out[442]:
A B C D
0 2.0 NaN 1 2.0
1 0.0 2.0 0 1.0
2 NaN 1.0 1 2.0
3 1.0 0.0 2 NaN

pow

  • More Information
  • Exponential power of dataframe and other, element-wise (binary operator pow).

Example-1

In [443]:
df = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 4]})
df
Out[443]:
A B
0 1 3
1 2 4
In [444]:
df.pow(2)
Out[444]:
A B
0 1 9
1 4 16

Example-2

In [445]:
df1 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 4]})
In [446]:
df1
Out[446]:
A B
0 1 3
1 2 4
In [447]:
df2 = pd.DataFrame({"A":[5, 6], 
                   "B":[7, 8]})
In [448]:
df2
Out[448]:
A B
0 5 7
1 6 8
In [449]:
df1.pow(df2)
Out[449]:
A B
0 1 2187
1 64 65536

abs

  • More Information
  • Return a Series/DataFrame with absolute numeric value of each element.
  • This function only applies to elements that are all numeric.

Example-1

In [450]:
df=pd.DataFrame({
'Temperature':[-30,32,-34,36],
'Windspeed':[6,7,10,-12]
})
df
Out[450]:
Temperature Windspeed
0 -30 6
1 32 7
2 -34 10
3 36 -12
In [451]:
df['Windspeed'] = df['Windspeed'].abs()
df
Out[451]:
Temperature Windspeed
0 -30 6
1 32 7
2 -34 10
3 36 12

Example-2

In [452]:
df=pd.DataFrame({
'Temperature':[-30,32,-34,36,1.2 + 1j],
'Windspeed':[6,7,10,-12,1.2 + 1j]
})
df
Out[452]:
Temperature Windspeed
0 -30.0+0.0j 6.0+0.0j
1 32.0+0.0j 7.0+0.0j
2 -34.0+0.0j 10.0+0.0j
3 36.0+0.0j -12.0+0.0j
4 1.2+1.0j 1.2+1.0j
In [453]:
df=df.abs()
df
Out[453]:
Temperature Windspeed
0 30.00000 6.00000
1 32.00000 7.00000
2 34.00000 10.00000
3 36.00000 12.00000
4 1.56205 1.56205

round

Example-1

In [454]:
df.round(0)
Out[454]:
Temperature Windspeed
0 30.0 6.0
1 32.0 7.0
2 34.0 10.0
3 36.0 12.0
4 2.0 2.0

Example-2

In [455]:
df.round({'Temperature': 0, 'Windspeed': 1})
Out[455]:
Temperature Windspeed
0 30.0 6.0
1 32.0 7.0
2 34.0 10.0
3 36.0 12.0
4 2.0 1.6

ceil

In [456]:
df_who= pd.read_csv('WHO_csv.csv')
df_who.head()
Out[456]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
In [457]:
df_who['Over60']=df_who['Over60'].apply(np.ceil)
df_who.head()
Out[457]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 4.0 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 15.0 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 8.0 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 23.0 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 4.0 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2

floor

In [458]:
df_who= pd.read_csv('WHO_csv.csv')
df_who.head()
Out[458]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
In [459]:
df_who['Over60']=df_who['Over60'].apply(np.floor)
df_who.head()
Out[459]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.0 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.0 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.0 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.0 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.0 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2

11.20 logical operations

  • Wrapper for flexible comparison methods eq
  • eq() is used to compare every element of Caller series with passed series.
  • It returns True for every element which is Equal to the element in passed series.
In [460]:
df1 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 4]})
df1
Out[460]:
A B
0 1 3
1 2 4
In [461]:
df2 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 8]})
df2
Out[461]:
A B
0 1 3
1 2 8
In [462]:
df1.eq(df2)
Out[462]:
A B
0 True True
1 True False

equals

  • Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.
In [463]:
df1 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 4]})
df1
Out[463]:
A B
0 1 3
1 2 4
In [464]:
df2 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 8]})
df2
Out[464]:
A B
0 1 3
1 2 8
In [465]:
df1.equals(df2)
Out[465]:
False

ge

  • Wrapper for flexible comparison methods ge
  • ge() is used to compare every element of Caller series with passed series.
  • It returns True for every element which is Greater than or Equal to the element in passed series.
In [466]:
df1 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 4]})
df1
Out[466]:
A B
0 1 3
1 2 4
In [467]:
df2 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 8]})
In [468]:
df1.ge(df2)
Out[468]:
A B
0 True True
1 True False

gt

  • Wrapper for flexible comparison methods gt
In [469]:
df1 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 4]})
df1
Out[469]:
A B
0 1 3
1 2 4
In [470]:
df2 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 8]})
df2
Out[470]:
A B
0 1 3
1 2 8
In [471]:
df1.gt(df2)
Out[471]:
A B
0 False False
1 False False

le

  • Wrapper for flexible comparison methods le
In [472]:
df1 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 4]})
df1
Out[472]:
A B
0 1 3
1 2 4
In [473]:
df2 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 8]})
df2
Out[473]:
A B
0 1 3
1 2 8
In [474]:
df1.le(df2)
Out[474]:
A B
0 True True
1 True True

lt

  • Wrapper for flexible comparison methods lt
In [475]:
df1 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 4]})
df1
Out[475]:
A B
0 1 3
1 2 4
In [476]:
df2 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 8]})
df2
Out[476]:
A B
0 1 3
1 2 8
In [477]:
df1.lt(df2)
Out[477]:
A B
0 False False
1 False True

ne

  • Wrapper for flexible comparison methods ne
In [478]:
df1 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 4]})
df1
Out[478]:
A B
0 1 3
1 2 4
In [479]:
df2 = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 8]})
df2
Out[479]:
A B
0 1 3
1 2 8
In [480]:
df1.ne(df2)
Out[480]:
A B
0 False False
1 False True

all any and bool

  • All :Return whether all elements are True, potentially over an axis
  • Any :Return whether any element is True over requested axis.
  • bool: - Return the bool of a single element PandasObject.
    • This must be a boolean scalar value, either True or False.
    • Raise a ValueError if the PandasObject does not have exactly 1 element, or that element is not boolean.
  • More Information1
  • More Information2
  • More Information3
In [481]:
df = pd.DataFrame()
df['x'] = [1,2,16]
df['y'] = [3,4,5]
In [482]:
df['z']=df['x'] < df['y']
df
Out[482]:
x y z
0 1 3 True
1 2 4 True
2 16 5 False
In [483]:
df['z'].all()
Out[483]:
False
In [484]:
df['z'].any()
Out[484]:
True
In [485]:
df['z'].bool
Out[485]:
<bound method NDFrame.bool of 0     True
1     True
2    False
Name: z, dtype: bool>

11.21 Miscellaneous Operations

In [486]:
df_who= pd.read_csv('WHO_csv.csv')
df_who.head()
Out[486]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
In [487]:
from IPython.display import YouTubeVideo
YouTubeVideo('bofaC0IckHo',width=900, height=500)
Out[487]:
In [488]:
df_who['Country']=df_who.Country.str.upper()
df_who.head()
Out[488]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 AFGHANISTAN Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 ALBANIA Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 ALGERIA Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 ANDORRA Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 ANGOLA Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2

Lower Case

In [489]:
df_who['Region']=df_who.Region.str.lower()
df_who.head()
Out[489]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 AFGHANISTAN eastern mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 ALBANIA europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 ALGERIA africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 ANDORRA europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 ANGOLA africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2

capitalize

In [490]:
df = pd.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
In [491]:
df.str.capitalize()
Out[491]:
0                 Lower
1              Capitals
2    This is a sentence
3              Swapcase
dtype: object

title

In [492]:
df = pd.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
In [493]:
df.str.title()
Out[493]:
0                 Lower
1              Capitals
2    This Is A Sentence
3              Swapcase
dtype: object

swapcase

In [494]:
df = pd.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
In [495]:
df.str.swapcase()
Out[495]:
0                 LOWER
1              capitals
2    THIS IS A SENTENCE
3              sWaPcAsE
dtype: object

Concatenate Two Column in Pandas

Method-1

In [496]:
df_who['Country_Region']=df_who.Country+'_'+df_who.Region
df_who.head()
Out[496]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale Country_Region
0 AFGHANISTAN eastern mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN AFGHANISTAN_eastern mediterranean
1 ALBANIA europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN ALBANIA_europe
2 ALGERIA africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4 ALGERIA_africa
3 ANDORRA europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4 ANDORRA_europe
4 ANGOLA africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2 ANGOLA_africa

Method-2

In [497]:
df_who= pd.read_csv('WHO_csv.csv')
df_who.head()
Out[497]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
In [498]:
df_who['c_r']=df_who['Country'].str.cat(df_who['Region'], sep=',')
df_who.head()
Out[498]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale c_r
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN Afghanistan,Eastern Mediterranean
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN Albania,Europe
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4 Algeria,Africa
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4 Andorra,Europe
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2 Angola,Africa

insert Column in Pandas

In [499]:
df_who.insert(2,'Country_Region',df_who.Country+'_'+df_who.Region)
df_who.head()
Out[499]:
Country Region Country_Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale c_r
0 Afghanistan Eastern Mediterranean Afghanistan_Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN Afghanistan,Eastern Mediterranean
1 Albania Europe Albania_Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN Albania,Europe
2 Algeria Africa Algeria_Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4 Algeria,Africa
3 Andorra Europe Andorra_Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4 Andorra,Europe
4 Angola Africa Angola_Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2 Angola,Africa
In [500]:
df_who.insert(2,'Country_Region',df_who.Country+'_'+df_who.Region,allow_duplicates=True)
df_who.head()
Out[500]:
Country Region Country_Region Country_Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale c_r
0 Afghanistan Eastern Mediterranean Afghanistan_Eastern Mediterranean Afghanistan_Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN Afghanistan,Eastern Mediterranean
1 Albania Europe Albania_Europe Albania_Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN Albania,Europe
2 Algeria Africa Algeria_Africa Algeria_Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4 Algeria,Africa
3 Andorra Europe Andorra_Europe Andorra_Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4 Andorra,Europe
4 Angola Africa Angola_Africa Angola_Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2 Angola,Africa

eval

  • Evaluate a string describing operations on DataFrame columns.
  • Operates on columns only, not specific rows or elements.
  • This allows eval to run arbitrary code, which can make you vulnerable to code injection if you pass user input to this function.
  • More Information
In [501]:
df = pd.DataFrame({"A":[1, 2], 
                   "B":[3, 4]})

df = df.eval('C=A + B')
df
Out[501]:
A B C
0 1 3 4
1 2 4 6

assign

In [502]:
df = pd.DataFrame({
     'a': [4, 5, 6, 7],
     'b': [10, 20, 30, 40],
     'c': [100, 50, -30, -50]
})
df
Out[502]:
a b c
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50
In [503]:
df.assign(D=df['a']**2, E=df.b*2)
Out[503]:
a b c D E
0 4 10 100 16 20
1 5 20 50 25 40
2 6 30 -30 36 60
3 7 40 -50 49 80

query

Example-1

In [504]:
df = pd.DataFrame()
df['x'] = [11,2,16]
df['y'] = [3,4,15]
In [505]:
df.query('x > y')
Out[505]:
x y
0 11 3
2 16 15

Example-2

In [506]:
df1=pd.DataFrame({
        'ID':['F2017313014','F2017313015','F2017313017'],
        'Marks':[98,97,99],
        'Name':['Umer','Ali','Raza']
                })
df1
Out[506]:
ID Marks Name
0 F2017313014 98 Umer
1 F2017313015 97 Ali
2 F2017313017 99 Raza
In [507]:
df2=pd.DataFrame({
        'ID':['F2017313014','F2017313018'],
        'age':[20,22],
        
        })
df2
Out[507]:
ID age
0 F2017313014 20
1 F2017313018 22
In [508]:
df=pd.merge(df1,df2,on="ID",how="outer",indicator=True)
df
Out[508]:
ID Marks Name age _merge
0 F2017313014 98.0 Umer 20.0 both
1 F2017313015 97.0 Ali NaN left_only
2 F2017313017 99.0 Raza NaN left_only
3 F2017313018 NaN NaN 22.0 right_only
In [509]:
df=df.query('_merge != "both"')
df
Out[509]:
ID Marks Name age _merge
1 F2017313015 97.0 Ali NaN left_only
2 F2017313017 99.0 Raza NaN left_only
3 F2017313018 NaN NaN 22.0 right_only

Drop Columns in Pandas

Method-1

drop

In [510]:
from IPython.display import YouTubeVideo
YouTubeVideo('gnUKkS964WQ',width=900, height=500)
Out[510]:
In [511]:
df_who.drop(['Country_Region'],axis=1,inplace=True)
df_who.head()
Out[511]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale c_r
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN Afghanistan,Eastern Mediterranean
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN Albania,Europe
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4 Algeria,Africa
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4 Andorra,Europe
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2 Angola,Africa

Method-2

pop

In [512]:
df_who.pop('Country')
df_who.head()
Out[512]:
Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale c_r
0 Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN Afghanistan,Eastern Mediterranean
1 Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN Albania,Europe
2 Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4 Algeria,Africa
3 Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4 Andorra,Europe
4 Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2 Angola,Africa

Drop Rows in Pandas

In [513]:
df_who= pd.read_csv('WHO_csv.csv')
df_who.drop(df_who.index[[0,2,4]],axis=0,inplace=True)
df_who.head()
Out[513]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
6 Argentina Americas 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN
7 Armenia Europe 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN

Sort Data Frame

In [514]:
from IPython.display import YouTubeVideo
YouTubeVideo('zY4doF6xSxY',width=900, height=500)
Out[514]:
In [515]:
df_who.sort_values('Population')
Out[515]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
125 Niue Western Pacific 1 30.61 9.07 NaN 72 25.1 NaN NaN NaN NaN NaN
179 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
118 Nauru Western Pacific 10 30.10 8.84 NaN 71 37.1 65.00 NaN NaN NaN NaN
129 Palau Western Pacific 21 30.10 8.84 NaN 72 20.8 74.94 NaN 11080.0 NaN NaN
39 Cook Islands Western Pacific 21 30.61 9.07 NaN 77 10.6 NaN NaN NaN 97.6 99.3
147 San Marino Europe 31 14.04 26.97 NaN 83 3.3 111.75 NaN NaN NaN NaN
111 Monaco Europe 38 18.26 23.82 NaN 82 3.8 89.73 NaN NaN NaN NaN
106 Marshall Islands Western Pacific 53 30.10 8.84 NaN 60 37.9 NaN NaN NaN NaN NaN
143 Saint Kitts and Nevis Americas 54 25.96 12.35 NaN 74 9.2 NaN NaN 16470.0 85.8 86.2
50 Dominica Americas 72 25.96 12.35 NaN 74 12.6 164.02 NaN 13000.0 NaN NaN
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
152 Seychelles Africa 92 21.95 10.05 2.23 74 13.1 145.71 91.8 25140.0 NaN NaN
89 Kiribati Western Pacific 101 30.10 8.84 3.01 67 59.9 13.64 NaN 3300.0 NaN NaN
110 Micronesia (Federated States of) Western Pacific 103 35.81 6.67 3.40 69 38.5 NaN NaN 3580.0 NaN NaN
68 Grenada Americas 105 26.96 9.72 2.22 74 13.5 NaN NaN 10350.0 NaN NaN
174 Tonga Western Pacific 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
145 Saint Vincent and the Grenadines Americas 109 25.70 9.92 2.05 74 23.4 120.52 NaN 10440.0 NaN NaN
144 Saint Lucia Americas 181 24.31 12.13 1.96 75 17.5 123.00 NaN 11220.0 90.2 89.2
148 Sao Tome and Principe Africa 188 41.60 4.76 4.22 63 53.2 68.26 89.2 2080.0 NaN NaN
146 Samoa Western Pacific 189 37.88 7.39 4.28 73 17.8 NaN 98.8 4270.0 93.2 97.1
188 Vanuatu Western Pacific 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
17 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
76 Iceland Europe 326 20.71 17.62 2.11 82 2.3 106.08 NaN 31020.0 98.8 99.2
103 Maldives South-East Asia 338 29.03 6.65 2.31 77 10.5 165.72 NaN 7430.0 96.5 96.5
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
24 Brunei Darussalam Western Pacific 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
105 Malta Europe 428 14.98 22.87 1.37 80 6.8 124.86 NaN NaN 93.3 94.3
31 Cape Verde Africa 494 30.17 7.05 2.38 72 22.2 79.19 84.3 3980.0 94.6 92.4
... ... ... ... ... ... ... ... ... ... ... ... ... ...
181 Ukraine Europe 45530 14.18 20.76 1.45 71 10.7 122.98 99.7 7040.0 90.8 91.5
161 Spain Europe 46755 15.20 22.86 1.47 82 4.5 113.22 97.7 31400.0 99.7 99.8
36 Colombia Americas 47704 28.03 9.19 2.35 78 17.6 98.45 93.4 9560.0 91.7 91.3
184 United Republic of Tanzania Africa 47783 44.85 4.89 5.36 59 54.0 55.53 73.2 1500.0 NaN NaN
138 Republic of Korea Western Pacific 49003 15.25 16.58 1.29 81 3.8 108.50 NaN 30370.0 99.3 98.4
159 South Africa Africa 52386 29.53 8.44 2.44 58 44.6 126.83 NaN 10710.0 NaN NaN
116 Myanmar South-East Asia 52797 25.28 8.15 1.98 65 52.3 2.57 92.3 NaN NaN NaN
83 Italy Europe 60885 14.04 26.97 1.45 82 3.8 157.93 98.9 32400.0 99.6 98.5
183 United Kingdom Europe 62783 17.54 23.06 1.90 80 4.8 130.75 NaN 36010.0 99.8 99.6
61 France Europe 63937 18.26 23.82 1.98 82 4.1 94.79 NaN 35910.0 99.1 99.3
47 Democratic Republic of the Congo Africa 65705 45.11 4.51 6.15 49 145.7 23.09 66.8 340.0 NaN NaN
170 Thailand South-East Asia 66785 18.47 13.96 1.43 74 13.2 111.63 NaN 8360.0 NaN NaN
177 Turkey Europe 73997 26.00 10.56 2.08 76 14.2 88.70 NaN 16940.0 99.5 98.3
79 Iran (Islamic Republic of) Eastern Mediterranean 76424 23.68 7.82 1.91 73 17.6 74.93 NaN NaN NaN NaN
53 Egypt Eastern Mediterranean 80722 31.25 8.62 2.85 73 21.0 101.08 72.0 6120.0 NaN NaN
65 Germany Europe 82800 13.17 26.72 1.40 81 4.1 132.30 NaN 40230.0 NaN NaN
190 Viet Nam Western Pacific 90796 22.87 9.32 1.79 75 23.0 143.39 93.2 3250.0 NaN NaN
58 Ethiopia Africa 91729 43.29 5.17 4.77 60 68.3 16.67 NaN 1110.0 84.8 79.5
134 Philippines Western Pacific 96707 34.53 6.21 3.11 69 29.8 99.30 NaN 4140.0 NaN NaN
109 Mexico Americas 121000 29.02 9.18 2.25 75 16.2 82.38 93.1 15390.0 99.2 99.9
85 Japan Western Pacific 127000 13.12 31.92 1.39 83 3.0 104.95 NaN 35330.0 NaN NaN
141 Russian Federation Europe 143000 15.45 18.60 1.51 69 10.3 179.31 99.6 20560.0 NaN NaN
13 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
124 Nigeria Africa 169000 44.23 4.49 6.02 53 123.7 58.58 61.3 2290.0 60.1 54.8
128 Pakistan Eastern Mediterranean 179000 34.31 6.44 3.35 67 85.9 61.61 NaN 2870.0 81.3 66.5
23 Brazil Americas 199000 24.56 10.81 1.82 74 14.4 124.26 NaN 11420.0 NaN NaN
78 Indonesia South-East Asia 247000 29.27 7.86 2.40 69 31.0 103.09 NaN 4500.0 NaN NaN
185 United States of America Americas 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1
77 India South-East Asia 1240000 29.43 8.10 2.53 65 56.3 72.00 NaN 3590.0 NaN NaN
35 China Western Pacific 1390000 17.95 13.42 1.66 76 14.0 73.19 94.3 8390.0 NaN NaN

191 rows × 13 columns

In [516]:
df_who.sort_values('Population',ascending=True)
Out[516]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
125 Niue Western Pacific 1 30.61 9.07 NaN 72 25.1 NaN NaN NaN NaN NaN
179 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
118 Nauru Western Pacific 10 30.10 8.84 NaN 71 37.1 65.00 NaN NaN NaN NaN
129 Palau Western Pacific 21 30.10 8.84 NaN 72 20.8 74.94 NaN 11080.0 NaN NaN
39 Cook Islands Western Pacific 21 30.61 9.07 NaN 77 10.6 NaN NaN NaN 97.6 99.3
147 San Marino Europe 31 14.04 26.97 NaN 83 3.3 111.75 NaN NaN NaN NaN
111 Monaco Europe 38 18.26 23.82 NaN 82 3.8 89.73 NaN NaN NaN NaN
106 Marshall Islands Western Pacific 53 30.10 8.84 NaN 60 37.9 NaN NaN NaN NaN NaN
143 Saint Kitts and Nevis Americas 54 25.96 12.35 NaN 74 9.2 NaN NaN 16470.0 85.8 86.2
50 Dominica Americas 72 25.96 12.35 NaN 74 12.6 164.02 NaN 13000.0 NaN NaN
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
152 Seychelles Africa 92 21.95 10.05 2.23 74 13.1 145.71 91.8 25140.0 NaN NaN
89 Kiribati Western Pacific 101 30.10 8.84 3.01 67 59.9 13.64 NaN 3300.0 NaN NaN
110 Micronesia (Federated States of) Western Pacific 103 35.81 6.67 3.40 69 38.5 NaN NaN 3580.0 NaN NaN
68 Grenada Americas 105 26.96 9.72 2.22 74 13.5 NaN NaN 10350.0 NaN NaN
174 Tonga Western Pacific 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
145 Saint Vincent and the Grenadines Americas 109 25.70 9.92 2.05 74 23.4 120.52 NaN 10440.0 NaN NaN
144 Saint Lucia Americas 181 24.31 12.13 1.96 75 17.5 123.00 NaN 11220.0 90.2 89.2
148 Sao Tome and Principe Africa 188 41.60 4.76 4.22 63 53.2 68.26 89.2 2080.0 NaN NaN
146 Samoa Western Pacific 189 37.88 7.39 4.28 73 17.8 NaN 98.8 4270.0 93.2 97.1
188 Vanuatu Western Pacific 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
17 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
76 Iceland Europe 326 20.71 17.62 2.11 82 2.3 106.08 NaN 31020.0 98.8 99.2
103 Maldives South-East Asia 338 29.03 6.65 2.31 77 10.5 165.72 NaN 7430.0 96.5 96.5
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
24 Brunei Darussalam Western Pacific 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
105 Malta Europe 428 14.98 22.87 1.37 80 6.8 124.86 NaN NaN 93.3 94.3
31 Cape Verde Africa 494 30.17 7.05 2.38 72 22.2 79.19 84.3 3980.0 94.6 92.4
... ... ... ... ... ... ... ... ... ... ... ... ... ...
181 Ukraine Europe 45530 14.18 20.76 1.45 71 10.7 122.98 99.7 7040.0 90.8 91.5
161 Spain Europe 46755 15.20 22.86 1.47 82 4.5 113.22 97.7 31400.0 99.7 99.8
36 Colombia Americas 47704 28.03 9.19 2.35 78 17.6 98.45 93.4 9560.0 91.7 91.3
184 United Republic of Tanzania Africa 47783 44.85 4.89 5.36 59 54.0 55.53 73.2 1500.0 NaN NaN
138 Republic of Korea Western Pacific 49003 15.25 16.58 1.29 81 3.8 108.50 NaN 30370.0 99.3 98.4
159 South Africa Africa 52386 29.53 8.44 2.44 58 44.6 126.83 NaN 10710.0 NaN NaN
116 Myanmar South-East Asia 52797 25.28 8.15 1.98 65 52.3 2.57 92.3 NaN NaN NaN
83 Italy Europe 60885 14.04 26.97 1.45 82 3.8 157.93 98.9 32400.0 99.6 98.5
183 United Kingdom Europe 62783 17.54 23.06 1.90 80 4.8 130.75 NaN 36010.0 99.8 99.6
61 France Europe 63937 18.26 23.82 1.98 82 4.1 94.79 NaN 35910.0 99.1 99.3
47 Democratic Republic of the Congo Africa 65705 45.11 4.51 6.15 49 145.7 23.09 66.8 340.0 NaN NaN
170 Thailand South-East Asia 66785 18.47 13.96 1.43 74 13.2 111.63 NaN 8360.0 NaN NaN
177 Turkey Europe 73997 26.00 10.56 2.08 76 14.2 88.70 NaN 16940.0 99.5 98.3
79 Iran (Islamic Republic of) Eastern Mediterranean 76424 23.68 7.82 1.91 73 17.6 74.93 NaN NaN NaN NaN
53 Egypt Eastern Mediterranean 80722 31.25 8.62 2.85 73 21.0 101.08 72.0 6120.0 NaN NaN
65 Germany Europe 82800 13.17 26.72 1.40 81 4.1 132.30 NaN 40230.0 NaN NaN
190 Viet Nam Western Pacific 90796 22.87 9.32 1.79 75 23.0 143.39 93.2 3250.0 NaN NaN
58 Ethiopia Africa 91729 43.29 5.17 4.77 60 68.3 16.67 NaN 1110.0 84.8 79.5
134 Philippines Western Pacific 96707 34.53 6.21 3.11 69 29.8 99.30 NaN 4140.0 NaN NaN
109 Mexico Americas 121000 29.02 9.18 2.25 75 16.2 82.38 93.1 15390.0 99.2 99.9
85 Japan Western Pacific 127000 13.12 31.92 1.39 83 3.0 104.95 NaN 35330.0 NaN NaN
141 Russian Federation Europe 143000 15.45 18.60 1.51 69 10.3 179.31 99.6 20560.0 NaN NaN
13 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
124 Nigeria Africa 169000 44.23 4.49 6.02 53 123.7 58.58 61.3 2290.0 60.1 54.8
128 Pakistan Eastern Mediterranean 179000 34.31 6.44 3.35 67 85.9 61.61 NaN 2870.0 81.3 66.5
23 Brazil Americas 199000 24.56 10.81 1.82 74 14.4 124.26 NaN 11420.0 NaN NaN
78 Indonesia South-East Asia 247000 29.27 7.86 2.40 69 31.0 103.09 NaN 4500.0 NaN NaN
185 United States of America Americas 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1
77 India South-East Asia 1240000 29.43 8.10 2.53 65 56.3 72.00 NaN 3590.0 NaN NaN
35 China Western Pacific 1390000 17.95 13.42 1.66 76 14.0 73.19 94.3 8390.0 NaN NaN

191 rows × 13 columns

In [517]:
df_who.sort_values('Population',ascending=False)
Out[517]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
35 China Western Pacific 1390000 17.95 13.42 1.66 76 14.0 73.19 94.3 8390.0 NaN NaN
77 India South-East Asia 1240000 29.43 8.10 2.53 65 56.3 72.00 NaN 3590.0 NaN NaN
185 United States of America Americas 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1
78 Indonesia South-East Asia 247000 29.27 7.86 2.40 69 31.0 103.09 NaN 4500.0 NaN NaN
23 Brazil Americas 199000 24.56 10.81 1.82 74 14.4 124.26 NaN 11420.0 NaN NaN
128 Pakistan Eastern Mediterranean 179000 34.31 6.44 3.35 67 85.9 61.61 NaN 2870.0 81.3 66.5
124 Nigeria Africa 169000 44.23 4.49 6.02 53 123.7 58.58 61.3 2290.0 60.1 54.8
13 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
141 Russian Federation Europe 143000 15.45 18.60 1.51 69 10.3 179.31 99.6 20560.0 NaN NaN
85 Japan Western Pacific 127000 13.12 31.92 1.39 83 3.0 104.95 NaN 35330.0 NaN NaN
109 Mexico Americas 121000 29.02 9.18 2.25 75 16.2 82.38 93.1 15390.0 99.2 99.9
134 Philippines Western Pacific 96707 34.53 6.21 3.11 69 29.8 99.30 NaN 4140.0 NaN NaN
58 Ethiopia Africa 91729 43.29 5.17 4.77 60 68.3 16.67 NaN 1110.0 84.8 79.5
190 Viet Nam Western Pacific 90796 22.87 9.32 1.79 75 23.0 143.39 93.2 3250.0 NaN NaN
65 Germany Europe 82800 13.17 26.72 1.40 81 4.1 132.30 NaN 40230.0 NaN NaN
53 Egypt Eastern Mediterranean 80722 31.25 8.62 2.85 73 21.0 101.08 72.0 6120.0 NaN NaN
79 Iran (Islamic Republic of) Eastern Mediterranean 76424 23.68 7.82 1.91 73 17.6 74.93 NaN NaN NaN NaN
177 Turkey Europe 73997 26.00 10.56 2.08 76 14.2 88.70 NaN 16940.0 99.5 98.3
170 Thailand South-East Asia 66785 18.47 13.96 1.43 74 13.2 111.63 NaN 8360.0 NaN NaN
47 Democratic Republic of the Congo Africa 65705 45.11 4.51 6.15 49 145.7 23.09 66.8 340.0 NaN NaN
61 France Europe 63937 18.26 23.82 1.98 82 4.1 94.79 NaN 35910.0 99.1 99.3
183 United Kingdom Europe 62783 17.54 23.06 1.90 80 4.8 130.75 NaN 36010.0 99.8 99.6
83 Italy Europe 60885 14.04 26.97 1.45 82 3.8 157.93 98.9 32400.0 99.6 98.5
116 Myanmar South-East Asia 52797 25.28 8.15 1.98 65 52.3 2.57 92.3 NaN NaN NaN
159 South Africa Africa 52386 29.53 8.44 2.44 58 44.6 126.83 NaN 10710.0 NaN NaN
138 Republic of Korea Western Pacific 49003 15.25 16.58 1.29 81 3.8 108.50 NaN 30370.0 99.3 98.4
184 United Republic of Tanzania Africa 47783 44.85 4.89 5.36 59 54.0 55.53 73.2 1500.0 NaN NaN
36 Colombia Americas 47704 28.03 9.19 2.35 78 17.6 98.45 93.4 9560.0 91.7 91.3
161 Spain Europe 46755 15.20 22.86 1.47 82 4.5 113.22 97.7 31400.0 99.7 99.8
181 Ukraine Europe 45530 14.18 20.76 1.45 71 10.7 122.98 99.7 7040.0 90.8 91.5
... ... ... ... ... ... ... ... ... ... ... ... ... ...
31 Cape Verde Africa 494 30.17 7.05 2.38 72 22.2 79.19 84.3 3980.0 94.6 92.4
105 Malta Europe 428 14.98 22.87 1.37 80 6.8 124.86 NaN NaN 93.3 94.3
24 Brunei Darussalam Western Pacific 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
103 Maldives South-East Asia 338 29.03 6.65 2.31 77 10.5 165.72 NaN 7430.0 96.5 96.5
76 Iceland Europe 326 20.71 17.62 2.11 82 2.3 106.08 NaN 31020.0 98.8 99.2
17 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
188 Vanuatu Western Pacific 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN
146 Samoa Western Pacific 189 37.88 7.39 4.28 73 17.8 NaN 98.8 4270.0 93.2 97.1
148 Sao Tome and Principe Africa 188 41.60 4.76 4.22 63 53.2 68.26 89.2 2080.0 NaN NaN
144 Saint Lucia Americas 181 24.31 12.13 1.96 75 17.5 123.00 NaN 11220.0 90.2 89.2
145 Saint Vincent and the Grenadines Americas 109 25.70 9.92 2.05 74 23.4 120.52 NaN 10440.0 NaN NaN
174 Tonga Western Pacific 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
68 Grenada Americas 105 26.96 9.72 2.22 74 13.5 NaN NaN 10350.0 NaN NaN
110 Micronesia (Federated States of) Western Pacific 103 35.81 6.67 3.40 69 38.5 NaN NaN 3580.0 NaN NaN
89 Kiribati Western Pacific 101 30.10 8.84 3.01 67 59.9 13.64 NaN 3300.0 NaN NaN
152 Seychelles Africa 92 21.95 10.05 2.23 74 13.1 145.71 91.8 25140.0 NaN NaN
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
50 Dominica Americas 72 25.96 12.35 NaN 74 12.6 164.02 NaN 13000.0 NaN NaN
143 Saint Kitts and Nevis Americas 54 25.96 12.35 NaN 74 9.2 NaN NaN 16470.0 85.8 86.2
106 Marshall Islands Western Pacific 53 30.10 8.84 NaN 60 37.9 NaN NaN NaN NaN NaN
111 Monaco Europe 38 18.26 23.82 NaN 82 3.8 89.73 NaN NaN NaN NaN
147 San Marino Europe 31 14.04 26.97 NaN 83 3.3 111.75 NaN NaN NaN NaN
39 Cook Islands Western Pacific 21 30.61 9.07 NaN 77 10.6 NaN NaN NaN 97.6 99.3
129 Palau Western Pacific 21 30.10 8.84 NaN 72 20.8 74.94 NaN 11080.0 NaN NaN
179 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
118 Nauru Western Pacific 10 30.10 8.84 NaN 71 37.1 65.00 NaN NaN NaN NaN
125 Niue Western Pacific 1 30.61 9.07 NaN 72 25.1 NaN NaN NaN NaN NaN

191 rows × 13 columns

Sort Data Frame w.r.t row

In [518]:
df= pd.read_csv('radj.csv',header=0)
df
Out[518]:
Name1 Name2
0 Umer Ali
1 Umer Ahmed
2 Ali Ahmed
3 Bilal Umer
4 Bilal Ali
5 Ahmed Bilal
In [519]:
pd.DataFrame(np.sort(df.values, axis=1), index=df.index, columns=df.columns)
Out[519]:
Name1 Name2
0 Ali Umer
1 Ahmed Umer
2 Ahmed Ali
3 Bilal Umer
4 Ali Bilal
5 Ahmed Bilal

Sort by multiple columns

In [520]:
df=pd.read_csv("GTRX.txt",skiprows=1,sep=',')
In [521]:
df=df.sort_values(['Cell Name','Is Main BCCH TRX'],ascending=[True,False])
In [522]:
df.to_csv('output8.csv')

argsort

In [523]:
df = pd.DataFrame({
     'a': [4, 5, 6, 7],
     'b': [10, 20, 30, 40],
     'c': [100, 50, -30, -50]
})
df
Out[523]:
a b c
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50
In [524]:
df.loc[(df.c - 43).abs().argsort()]
Out[524]:
a b c
1 5 20 50
0 4 10 100
2 6 30 -30
3 7 40 -50

rank

  • rank method returns a rank of every respective index of a series passed.
  • The rank is returned on the basis of position after sorting.
  • More Information
In [525]:
df_who["Rank"] = df_who["Population"].rank(ascending=True) 
In [526]:
df_who.head()
Out[526]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale Rank
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN 62.0
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4 11.0
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5 12.0
6 Argentina Americas 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN 160.0
7 Armenia Europe 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN 60.0
In [527]:
df_who.sort_values("Population", inplace = True)
In [528]:
df_who.head()
Out[528]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale Rank
125 Niue Western Pacific 1 30.61 9.07 NaN 72 25.1 NaN NaN NaN NaN NaN 1.0
179 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN 2.5
118 Nauru Western Pacific 10 30.10 8.84 NaN 71 37.1 65.00 NaN NaN NaN NaN 2.5
129 Palau Western Pacific 21 30.10 8.84 NaN 72 20.8 74.94 NaN 11080.0 NaN NaN 4.5
39 Cook Islands Western Pacific 21 30.61 9.07 NaN 77 10.6 NaN NaN NaN 97.6 99.3 4.5

Pandas trim leading and trailing white space in a dataframe

Example-1

In [529]:
df=pd.DataFrame({
'Day':[' 1/1/2019 ',' 1/2/2019 ','          1/3/2019','     1/4/2019'],
'Temperature':[30,32,34,36],
'Windspeed':[6,7,10,12],
'Event':[' Rain ','         Su nny ','Rain','     Sunny']
})
df
Out[529]:
Day Temperature Windspeed Event
0 1/1/2019 30 6 Rain
1 1/2/2019 32 7 Su nny
2 1/3/2019 34 10 Rain
3 1/4/2019 36 12 Sunny
In [530]:
df[(df.Event=='Rain')]
Out[530]:
Day Temperature Windspeed Event
2 1/3/2019 34 10 Rain
In [531]:
df[(df.Day=='1/3/2019')]
Out[531]:
Day Temperature Windspeed Event
In [532]:
df =df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
df
Out[532]:
Day Temperature Windspeed Event
0 1/1/2019 30 6 Rain
1 1/2/2019 32 7 Su nny
2 1/3/2019 34 10 Rain
3 1/4/2019 36 12 Sunny
In [533]:
df[(df.Event=='Rain')]
Out[533]:
Day Temperature Windspeed Event
0 1/1/2019 30 6 Rain
2 1/3/2019 34 10 Rain
In [534]:
df[(df.Day=='1/3/2019')]
Out[534]:
Day Temperature Windspeed Event
2 1/3/2019 34 10 Rain

Example-2

In [535]:
df='     hello world!'
df
Out[535]:
'     hello world!'
In [536]:
df.lstrip()
Out[536]:
'hello world!'

lambda

In [537]:
df=pd.read_csv("GTRX.txt",skiprows=1)
In [538]:
def band(Frequency):
    if Frequency<100:
        return 'GSM'
    else: 
        return 'DCS'
In [539]:
df['Band']=df['Frequency'].apply(lambda x:band(x))
In [540]:
df.head()
Out[540]:
BSC Name Cell Name Cell Index TRX ID TRX Name Frequency Is Main BCCH TRX TRX No. Temporarily Authorized TRX Group ID Active Status Administrative State Band
0 HFSDBSC07 14542_Sat Gharaa Okara 0 116 14542_Sat Gharaa P3 Sahiwal0 48 YES 0 NO 4294967295 ACTIVATED Unlock GSM
1 HFSDBSC07 14542_Sat Gharaa Okara 0 117 14542_Sat Gharaa P3 Sahiwal1 31 NO 1 NO 4294967295 ACTIVATED Unlock GSM
2 HFSDBSC07 14542_Sat Gharaa Okara 0 118 14542_Sat Gharaa P3 Sahiwal2 42 NO 2 NO 4294967295 ACTIVATED Unlock GSM
3 HFSDBSC07 14815_Chak No 214 Abadi Gojra More Jhang 1 2 14815_Chak # 214 Abadi Gojra More P3 Jhang0 25 NO 3 NO 4294967295 ACTIVATED Unlock GSM
4 HFSDBSC07 14815_Chak No 214 Abadi Gojra More Jhang 1 38 14815_Chak # 214 Abadi Gojra More P3 Jhang1 45 YES 2 NO 4294967295 ACTIVATED Unlock GSM

where

  • More Information1
  • More Information2
  • Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

Example-1

In [541]:
df=pd.read_csv("GTRX.txt",skiprows=1)
In [542]:
df['Band'] = np.where(
            ((df['Frequency']>=25) & (df['Frequency']<=62)),
            'GSM', 
            np.where(
                    (df['Frequency']>=556) & (df['Frequency']<=585), 
                    'DCS', 
                     'Other Band'))
In [543]:
df.head()
Out[543]:
BSC Name Cell Name Cell Index TRX ID TRX Name Frequency Is Main BCCH TRX TRX No. Temporarily Authorized TRX Group ID Active Status Administrative State Band
0 HFSDBSC07 14542_Sat Gharaa Okara 0 116 14542_Sat Gharaa P3 Sahiwal0 48 YES 0 NO 4294967295 ACTIVATED Unlock GSM
1 HFSDBSC07 14542_Sat Gharaa Okara 0 117 14542_Sat Gharaa P3 Sahiwal1 31 NO 1 NO 4294967295 ACTIVATED Unlock GSM
2 HFSDBSC07 14542_Sat Gharaa Okara 0 118 14542_Sat Gharaa P3 Sahiwal2 42 NO 2 NO 4294967295 ACTIVATED Unlock GSM
3 HFSDBSC07 14815_Chak No 214 Abadi Gojra More Jhang 1 2 14815_Chak # 214 Abadi Gojra More P3 Jhang0 25 NO 3 NO 4294967295 ACTIVATED Unlock GSM
4 HFSDBSC07 14815_Chak No 214 Abadi Gojra More Jhang 1 38 14815_Chak # 214 Abadi Gojra More P3 Jhang1 45 YES 2 NO 4294967295 ACTIVATED Unlock GSM

Example-2

In [544]:
df=pd.read_csv("GTRX.txt",skiprows=1,sep=',')
In [545]:
df['CI'] = np.where(
            (df['Cell Name'].str.startswith ('CI-')),
            df['Cell Name'].str[8]+""+df['Cell Name'].str[3:7], 
            np.where(
                    (df['Cell Name'].str.startswith ('CII-')), 
                    df['Cell Name'].str[9]+""+df['Cell Name'].str[4:8], 
                    np.where(
                            (df['Cell Name'].str.startswith ('S-')),
                             df['Cell Name'].str[7]+""+df['Cell Name'].str[2:5],
                              np.where(
                                      (df['Cell Name'].str.startswith ('N-')),
                                        df['Cell Name'].str[7]+""+df['Cell Name'].str[2:5],
                                       df['Cell Name'].str[:5]))))
In [546]:
df.head()
Out[546]:
BSC Name Cell Name Cell Index TRX ID TRX Name Frequency Is Main BCCH TRX TRX No. Temporarily Authorized TRX Group ID Active Status Administrative State CI
0 HFSDBSC07 14542_Sat Gharaa Okara 0 116 14542_Sat Gharaa P3 Sahiwal0 48 YES 0 NO 4294967295 ACTIVATED Unlock 14542
1 HFSDBSC07 14542_Sat Gharaa Okara 0 117 14542_Sat Gharaa P3 Sahiwal1 31 NO 1 NO 4294967295 ACTIVATED Unlock 14542
2 HFSDBSC07 14542_Sat Gharaa Okara 0 118 14542_Sat Gharaa P3 Sahiwal2 42 NO 2 NO 4294967295 ACTIVATED Unlock 14542
3 HFSDBSC07 14815_Chak No 214 Abadi Gojra More Jhang 1 2 14815_Chak # 214 Abadi Gojra More P3 Jhang0 25 NO 3 NO 4294967295 ACTIVATED Unlock 14815
4 HFSDBSC07 14815_Chak No 214 Abadi Gojra More Jhang 1 38 14815_Chak # 214 Abadi Gojra More P3 Jhang1 45 YES 2 NO 4294967295 ACTIVATED Unlock 14815

Split a string into multiple columns

Method-1

In [547]:
spl= pd.read_csv('Scopus_1926-1950.csv',encoding='latin-1')
# Preprocessing
# Remove Space between Author(s)ID
spl['Author(s)ID'] = spl['Author(s)ID'].str.replace(" ","")
# Remove Author(s)ID end with ;
spl['Author(s)ID'] = spl['Author(s)ID'].str.rstrip(';')
  • What if we wanted to split the "Author(s)ID" column into Five separate columns, for A0, A1, A2, A3, A4. We would use the str.split() method and tell it to split on a space character and expand the results into a DataFrame
In [548]:
a=spl['Author(s)ID'].str.split(';', expand=True).add_prefix('A')
In [549]:
spl=pd.concat([spl, a], axis=1)
spl.head()
Out[549]:
Authors Author(s)ID Title Year Sourcetitle Volume Issue Art.No. Pagestart Pageend ... DocumentType PublicationStage AccessType Source EID A0 A1 A2 A3 A4
0 Bhatnagar, S.S., Prasad, M., Singh, B. 55429686600;57197403605;55480056700 Einige physikalische Eigenschaften von einwert... 1926 Kolloid-Zeitschrift 38.0 3 NaN 218.0 222.0 ... Article Final NaN Scopus 2-s2.0-34347100553 55429686600 57197403605 55480056700 None None
1 Malik, K.S. 16664426200 Viskositäten einwertiger Salze der höheren Fet... 1926 Kolloid-Zeitschrift 39.0 4 NaN 322.0 324.0 ... Article Final NaN Scopus 2-s2.0-34347109543 16664426200 None None None None
2 Christensen, J. 57190353901 THE NEW AFGHANISTAN. 1926 The Muslim World 16.0 4 NaN 349.0 356.0 ... Article Final NaN Scopus 2-s2.0-84980098079 57190353901 None None None None
3 OSMASTON, B.B. 57190078988 The Birda of Ladakh 1926 Ibis 68.0 2 NaN 446.0 448.0 ... Letter Final NaN Scopus 2-s2.0-84977249430 57190078988 None None None None
4 Bakhsh, J.A. 57190457552 THE STORY OF MY CONVERSION 1926 The Muslim World 16.0 1 NaN 79.0 84.0 ... Article Final NaN Scopus 2-s2.0-84894912026 57190457552 None None None None

5 rows × 24 columns

Method-2

In [550]:
spl= pd.read_csv('Scopus_1926-1950.csv',encoding='latin-1')
# Preprocessing
# Remove Space between Author(s)ID
spl['Author(s)ID'] = spl['Author(s)ID'].str.replace(" ","")
# Remove Author(s)ID end with ;
spl['Author(s)ID'] = spl['Author(s)ID'].str.rstrip(';')
  • These Five columns can actually be saved to the original DataFrame in a single assignment statement:
In [551]:
spl[['Author0', 'Author1', 'Author2','Author3','Author4']] = spl['Author(s)ID'].str.split(';', expand=True)
In [552]:
spl.head()
Out[552]:
Authors Author(s)ID Title Year Sourcetitle Volume Issue Art.No. Pagestart Pageend ... DocumentType PublicationStage AccessType Source EID Author0 Author1 Author2 Author3 Author4
0 Bhatnagar, S.S., Prasad, M., Singh, B. 55429686600;57197403605;55480056700 Einige physikalische Eigenschaften von einwert... 1926 Kolloid-Zeitschrift 38.0 3 NaN 218.0 222.0 ... Article Final NaN Scopus 2-s2.0-34347100553 55429686600 57197403605 55480056700 None None
1 Malik, K.S. 16664426200 Viskositäten einwertiger Salze der höheren Fet... 1926 Kolloid-Zeitschrift 39.0 4 NaN 322.0 324.0 ... Article Final NaN Scopus 2-s2.0-34347109543 16664426200 None None None None
2 Christensen, J. 57190353901 THE NEW AFGHANISTAN. 1926 The Muslim World 16.0 4 NaN 349.0 356.0 ... Article Final NaN Scopus 2-s2.0-84980098079 57190353901 None None None None
3 OSMASTON, B.B. 57190078988 The Birda of Ladakh 1926 Ibis 68.0 2 NaN 446.0 448.0 ... Letter Final NaN Scopus 2-s2.0-84977249430 57190078988 None None None None
4 Bakhsh, J.A. 57190457552 THE STORY OF MY CONVERSION 1926 The Muslim World 16.0 1 NaN 79.0 84.0 ... Article Final NaN Scopus 2-s2.0-84894912026 57190457552 None None None None

5 rows × 24 columns

Method-3

In [553]:
spl2= pd.read_csv('Scopus_1926-1950.csv',encoding='latin-1')

# Preprocessing

# Remove Space between Author(s)ID
spl2['Author(s)ID'] = spl2['Author(s)ID'].str.replace(" ","")
# Remove Author(s)ID end with ;
spl2['Author(s)ID'] = spl2['Author(s)ID'].str.rstrip(';')
In [554]:
spl2 = spl2.join(spl2['Author(s)ID'].str.split(';', expand=True).add_prefix('Author_'))
In [555]:
spl2.head()
Out[555]:
Authors Author(s)ID Title Year Sourcetitle Volume Issue Art.No. Pagestart Pageend ... DocumentType PublicationStage AccessType Source EID Author_0 Author_1 Author_2 Author_3 Author_4
0 Bhatnagar, S.S., Prasad, M., Singh, B. 55429686600;57197403605;55480056700 Einige physikalische Eigenschaften von einwert... 1926 Kolloid-Zeitschrift 38.0 3 NaN 218.0 222.0 ... Article Final NaN Scopus 2-s2.0-34347100553 55429686600 57197403605 55480056700 None None
1 Malik, K.S. 16664426200 Viskositäten einwertiger Salze der höheren Fet... 1926 Kolloid-Zeitschrift 39.0 4 NaN 322.0 324.0 ... Article Final NaN Scopus 2-s2.0-34347109543 16664426200 None None None None
2 Christensen, J. 57190353901 THE NEW AFGHANISTAN. 1926 The Muslim World 16.0 4 NaN 349.0 356.0 ... Article Final NaN Scopus 2-s2.0-84980098079 57190353901 None None None None
3 OSMASTON, B.B. 57190078988 The Birda of Ladakh 1926 Ibis 68.0 2 NaN 446.0 448.0 ... Letter Final NaN Scopus 2-s2.0-84977249430 57190078988 None None None None
4 Bakhsh, J.A. 57190457552 THE STORY OF MY CONVERSION 1926 The Muslim World 16.0 1 NaN 79.0 84.0 ... Article Final NaN Scopus 2-s2.0-84894912026 57190457552 None None None None

5 rows × 24 columns

Special Case

  • If we only cared about saving the 1st Author ID in column, we can just select that column and save it to the DataFrame:
In [556]:
spl1= pd.read_csv('Scopus_1926-1950.csv',encoding='latin-1')

# Preprocessing
# Remove Space between Author(s)ID
spl1['Author(s)ID'] = spl1['Author(s)ID'].str.replace(" ","")
# Remove Author(s)ID end with ;
spl1['Author(s)ID'] = spl1['Author(s)ID'].str.rstrip(';')
In [557]:
spl1['FAuthor'] = spl1['Author(s)ID'].str.split(';', expand=True)[0]
spl1.head()
Out[557]:
Authors Author(s)ID Title Year Sourcetitle Volume Issue Art.No. Pagestart Pageend Pagecount Citedby DOI Link DocumentType PublicationStage AccessType Source EID FAuthor
0 Bhatnagar, S.S., Prasad, M., Singh, B. 55429686600;57197403605;55480056700 Einige physikalische Eigenschaften von einwert... 1926 Kolloid-Zeitschrift 38.0 3 NaN 218.0 222.0 NaN NaN 10.1007/BF01460832 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-34347100553 55429686600
1 Malik, K.S. 16664426200 Viskositäten einwertiger Salze der höheren Fet... 1926 Kolloid-Zeitschrift 39.0 4 NaN 322.0 324.0 NaN NaN 10.1007/BF01432039 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-34347109543 16664426200
2 Christensen, J. 57190353901 THE NEW AFGHANISTAN. 1926 The Muslim World 16.0 4 NaN 349.0 356.0 NaN NaN 10.1111/j.1478-1913.1926.tb00635.x https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84980098079 57190353901
3 OSMASTON, B.B. 57190078988 The Birda of Ladakh 1926 Ibis 68.0 2 NaN 446.0 448.0 NaN NaN 10.1111/j.1474-919X.1926.tb07597.x https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84977249430 57190078988
4 Bakhsh, J.A. 57190457552 THE STORY OF MY CONVERSION 1926 The Muslim World 16.0 1 NaN 79.0 84.0 NaN 1.0 10.1111/j.1478-1913.1926.tb00605.x https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84894912026 57190457552

11.22 8 ways to apply LEFT RIGHT MID in Pandas

  • At times, you may need to extract specific characters within a string.
  • You may then apply the concepts of Left, Right, and Mid in pandas to obtain your desired characters within a string.
  • the following 8 scenarios to explain how to extract specific characters:
In [558]:
Data = pd.DataFrame({'Identifier': ['55555-abc','77777-xyz','99999-mmm']})
Left = Data['Identifier'].str[:5]
Left
Out[558]:
0    55555
1    77777
2    99999
Name: Identifier, dtype: object

2- From the right

In [559]:
Data = pd.DataFrame({'Identifier': ['55555-abc','77777-xyz','99999-mmm']})
Right = Data['Identifier'].str[-3:]
Right
Out[559]:
0    abc
1    xyz
2    mmm
Name: Identifier, dtype: object

3- From the middle

In [560]:
Data = pd.DataFrame({'Identifier': ['ID-55555-End','ID-77777-End','ID-99999-End']})
Mid = Data['Identifier'].str[3:8]
Mid
Out[560]:
0    55555
1    77777
2    99999
Name: Identifier, dtype: object

4- Before a symbol

In [561]:
Data = pd.DataFrame({'Identifier': ['Umer-55555-H','Ali-77777-A','Ahmed-99999-S']})
BeforeSymbol = Data['Identifier'].str.split('-').str[0]
BeforeSymbol
Out[561]:
0     Umer
1      Ali
2    Ahmed
Name: Identifier, dtype: object

5- Before space

In [562]:
Data = pd.DataFrame({'Identifier': ['111 IDAA','2222222 IDB','33 IDCCC']})
BeforeSpace = Data['Identifier'].str.split(' ').str[0]
BeforeSpace
Out[562]:
0        111
1    2222222
2         33
Name: Identifier, dtype: object

6- After a symbol

In [563]:
Data = pd.DataFrame({'Identifier': ['IDAA-111','IDB-2222222','IDCCC-33']})
AfterSymbol = Data['Identifier'].str.split('-').str[1]
AfterSymbol
Out[563]:
0        111
1    2222222
2         33
Name: Identifier, dtype: object

7- Between identical symbols

In [564]:
Data = pd.DataFrame({'Identifier': ['IDAA-111-AA','IDB-2222222-B','IDCCC-33-CCC']})
BetweenTwoSymbols = Data['Identifier'].str.split('-').str[1]
BetweenTwoSymbols
Out[564]:
0        111
1    2222222
2         33
Name: Identifier, dtype: object

8- Between different symbols

In [565]:
Data = pd.DataFrame({'Identifier': ['IDAA-111$AA','IDB-2222222$B','IDCCC-33$CCC']})
betweenTwoDifferentSymbols = Data['Identifier'].str.split('-').str[1]
betweenTwoDifferentSymbols = betweenTwoDifferentSymbols.str.split('$').str[0]
betweenTwoDifferentSymbols
Out[565]:
0        111
1    2222222
2         33
Name: Identifier, dtype: object

11.23 Missing data handling

  • Process of cleaning messy data is called data munging or data wranling.
In [566]:
from IPython.display import YouTubeVideo
YouTubeVideo('4R4WsDJ-KV',width=900, height=500)
Out[566]:
In [567]:
df= pd.read_csv("Missing_Values_Handling.csv")
df
Out[567]:
Day Temperature Windspeed Event
0 1/1/2019 32.0 6.0 Rain
1 1/4/2019 NaN 9.0 Sunny
2 1/5/2019 28.0 NaN Snow
3 1/6/2019 NaN 7.0 NaN
4 1/7/2019 32.0 NaN Rain
5 1/8/2019 NaN NaN Sunny
6 1/9/2019 NaN NaN NaN
7 1/10/2019 34.0 8.0 Cloudy
8 1/11/2019 40.0 12.0 Sunny
In [568]:
from IPython.display import YouTubeVideo
YouTubeVideo('EaGbS7eWSs0',width=900, height=500)
Out[568]:

Fill NaN with specific value

Smiley face

In [569]:
df.fillna(value=0)
Out[569]:
Day Temperature Windspeed Event
0 1/1/2019 32.0 6.0 Rain
1 1/4/2019 0.0 9.0 Sunny
2 1/5/2019 28.0 0.0 Snow
3 1/6/2019 0.0 7.0 0
4 1/7/2019 32.0 0.0 Rain
5 1/8/2019 0.0 0.0 Sunny
6 1/9/2019 0.0 0.0 0
7 1/10/2019 34.0 8.0 Cloudy
8 1/11/2019 40.0 12.0 Sunny

Fill NaN with Customized Solutions

Smiley face

In [570]:
df.fillna({
        'Temperature':0,
        'Windspeed':0,
        'Event':'No Event'
        })
Out[570]:
Day Temperature Windspeed Event
0 1/1/2019 32.0 6.0 Rain
1 1/4/2019 0.0 9.0 Sunny
2 1/5/2019 28.0 0.0 Snow
3 1/6/2019 0.0 7.0 No Event
4 1/7/2019 32.0 0.0 Rain
5 1/8/2019 0.0 0.0 Sunny
6 1/9/2019 0.0 0.0 No Event
7 1/10/2019 34.0 8.0 Cloudy
8 1/11/2019 40.0 12.0 Sunny

Fill NaN with fwd fill method

Smiley face

In [571]:
df.fillna(method="ffill")
Out[571]:
Day Temperature Windspeed Event
0 1/1/2019 32.0 6.0 Rain
1 1/4/2019 32.0 9.0 Sunny
2 1/5/2019 28.0 9.0 Snow
3 1/6/2019 28.0 7.0 Snow
4 1/7/2019 32.0 7.0 Rain
5 1/8/2019 32.0 7.0 Sunny
6 1/9/2019 32.0 7.0 Sunny
7 1/10/2019 34.0 8.0 Cloudy
8 1/11/2019 40.0 12.0 Sunny

Fill NaN with fwd fill method and set Limit

Smiley face

In [572]:
df.fillna(method="ffill",limit=1)
Out[572]:
Day Temperature Windspeed Event
0 1/1/2019 32.0 6.0 Rain
1 1/4/2019 32.0 9.0 Sunny
2 1/5/2019 28.0 9.0 Snow
3 1/6/2019 28.0 7.0 Snow
4 1/7/2019 32.0 7.0 Rain
5 1/8/2019 32.0 NaN Sunny
6 1/9/2019 NaN NaN Sunny
7 1/10/2019 34.0 8.0 Cloudy
8 1/11/2019 40.0 12.0 Sunny

Fill NaN with backwd method

Smiley face

In [573]:
df.fillna(method="bfill")
Out[573]:
Day Temperature Windspeed Event
0 1/1/2019 32.0 6.0 Rain
1 1/4/2019 28.0 9.0 Sunny
2 1/5/2019 28.0 7.0 Snow
3 1/6/2019 32.0 7.0 Rain
4 1/7/2019 32.0 8.0 Rain
5 1/8/2019 34.0 8.0 Sunny
6 1/9/2019 34.0 8.0 Cloudy
7 1/10/2019 34.0 8.0 Cloudy
8 1/11/2019 40.0 12.0 Sunny

Fill NaN with backwd fill method and set Limit

Smiley face

In [574]:
df.fillna(method="bfill",limit=1)
Out[574]:
Day Temperature Windspeed Event
0 1/1/2019 32.0 6.0 Rain
1 1/4/2019 28.0 9.0 Sunny
2 1/5/2019 28.0 7.0 Snow
3 1/6/2019 32.0 7.0 Rain
4 1/7/2019 32.0 NaN Rain
5 1/8/2019 NaN NaN Sunny
6 1/9/2019 34.0 8.0 Cloudy
7 1/10/2019 34.0 8.0 Cloudy
8 1/11/2019 40.0 12.0 Sunny

Fill NaN with fwd fill method on columns

Smiley face

In [575]:
df.fillna(method="ffill",axis="columns")
Out[575]:
Day Temperature Windspeed Event
0 1/1/2019 32 6 Rain
1 1/4/2019 1/4/2019 9 Sunny
2 1/5/2019 28 28 Snow
3 1/6/2019 1/6/2019 7 7
4 1/7/2019 32 32 Rain
5 1/8/2019 1/8/2019 1/8/2019 Sunny
6 1/9/2019 1/9/2019 1/9/2019 1/9/2019
7 1/10/2019 34 8 Cloudy
8 1/11/2019 40 12 Sunny

Fill NaN with fwd fill method on columns with limit

Smiley face

In [576]:
df.fillna(method="ffill",axis="columns",limit=1)
Out[576]:
Day Temperature Windspeed Event
0 1/1/2019 32 6 Rain
1 1/4/2019 1/4/2019 9 Sunny
2 1/5/2019 28 28 Snow
3 1/6/2019 1/6/2019 7 7
4 1/7/2019 32 32 Rain
5 1/8/2019 1/8/2019 NaN Sunny
6 1/9/2019 1/9/2019 NaN NaN
7 1/10/2019 34 8 Cloudy
8 1/11/2019 40 12 Sunny

Fill NaN with bckwd fill method on columns

Smiley face

In [577]:
df.fillna(method="bfill",axis="columns")
Out[577]:
Day Temperature Windspeed Event
0 1/1/2019 32 6 Rain
1 1/4/2019 9 9 Sunny
2 1/5/2019 28 Snow Snow
3 1/6/2019 7 7 NaN
4 1/7/2019 32 Rain Rain
5 1/8/2019 Sunny Sunny Sunny
6 1/9/2019 NaN NaN NaN
7 1/10/2019 34 8 Cloudy
8 1/11/2019 40 12 Sunny

Fill NaN with bckwd fill method on columns with limit

In [578]:
df.fillna(method="bfill",axis="columns",limit=1)
Out[578]:
Day Temperature Windspeed Event
0 1/1/2019 32 6 Rain
1 1/4/2019 9 9 Sunny
2 1/5/2019 28 Snow Snow
3 1/6/2019 7 7 NaN
4 1/7/2019 32 Rain Rain
5 1/8/2019 NaN Sunny Sunny
6 1/9/2019 NaN NaN NaN
7 1/10/2019 34 8 Cloudy
8 1/11/2019 40 12 Sunny

Fill NaN using interpolate function

Smiley face

In [579]:
df.interpolate()
# This is linear interpolate method
Out[579]:
Day Temperature Windspeed Event
0 1/1/2019 32.000000 6.00 Rain
1 1/4/2019 30.000000 9.00 Sunny
2 1/5/2019 28.000000 8.00 Snow
3 1/6/2019 30.000000 7.00 NaN
4 1/7/2019 32.000000 7.25 Rain
5 1/8/2019 32.666667 7.50 Sunny
6 1/9/2019 33.333333 7.75 NaN
7 1/10/2019 34.000000 8.00 Cloudy
8 1/11/2019 40.000000 12.00 Sunny

Drop NaN rows or columns

In [580]:
df
Out[580]:
Day Temperature Windspeed Event
0 1/1/2019 32.0 6.0 Rain
1 1/4/2019 NaN 9.0 Sunny
2 1/5/2019 28.0 NaN Snow
3 1/6/2019 NaN 7.0 NaN
4 1/7/2019 32.0 NaN Rain
5 1/8/2019 NaN NaN Sunny
6 1/9/2019 NaN NaN NaN
7 1/10/2019 34.0 8.0 Cloudy
8 1/11/2019 40.0 12.0 Sunny
In [581]:
df.dropna()
Out[581]:
Day Temperature Windspeed Event
0 1/1/2019 32.0 6.0 Rain
7 1/10/2019 34.0 8.0 Cloudy
8 1/11/2019 40.0 12.0 Sunny
In [582]:
df.dropna(axis=0)
Out[582]:
Day Temperature Windspeed Event
0 1/1/2019 32.0 6.0 Rain
7 1/10/2019 34.0 8.0 Cloudy
8 1/11/2019 40.0 12.0 Sunny
In [583]:
df.dropna(axis=1)
Out[583]:
Day
0 1/1/2019
1 1/4/2019
2 1/5/2019
3 1/6/2019
4 1/7/2019
5 1/8/2019
6 1/9/2019
7 1/10/2019
8 1/11/2019

Drop NaN rows or columns if each element of the row having NaN

In [584]:
df.set_index('Day',inplace=True)
df.dropna(axis=0,how='all')
Out[584]:
Temperature Windspeed Event
Day
1/1/2019 32.0 6.0 Rain
1/4/2019 NaN 9.0 Sunny
1/5/2019 28.0 NaN Snow
1/6/2019 NaN 7.0 NaN
1/7/2019 32.0 NaN Rain
1/8/2019 NaN NaN Sunny
1/10/2019 34.0 8.0 Cloudy
1/11/2019 40.0 12.0 Sunny

Drop NaN Rows By Appling thresh

Example-1

In [585]:
df.dropna(thresh=2,axis=0)
Out[585]:
Temperature Windspeed Event
Day
1/1/2019 32.0 6.0 Rain
1/4/2019 NaN 9.0 Sunny
1/5/2019 28.0 NaN Snow
1/7/2019 32.0 NaN Rain
1/10/2019 34.0 8.0 Cloudy
1/11/2019 40.0 12.0 Sunny

Example-2

In [586]:
df.dropna(thresh=len(df)*0.2, axis=0)
Out[586]:
Temperature Windspeed Event
Day
1/1/2019 32.0 6.0 Rain
1/4/2019 NaN 9.0 Sunny
1/5/2019 28.0 NaN Snow
1/7/2019 32.0 NaN Rain
1/10/2019 34.0 8.0 Cloudy
1/11/2019 40.0 12.0 Sunny

11.24 Replace values methods

In [587]:
from IPython.display import YouTubeVideo
YouTubeVideo('XOxABiMhG2U',width=900, height=500)
Out[587]:
In [588]:
df= pd.read_csv("Weather2.csv")
df
Out[588]:
Date Temperature Windspeed Event
0 1/1/2019 30 6 Rain
1 1/2/2019 32 7 Sunny
2 1/3/2019 34 n.a Rain
3 1/4/2019 36 12 Sunny
4 1/5/2019 Not Available 13 Rain
In [589]:
df= pd.read_csv('Weather2.csv',na_values=['Not Available',"n.a"])
df
Out[589]:
Date Temperature Windspeed Event
0 1/1/2019 30.0 6.0 Rain
1 1/2/2019 32.0 7.0 Sunny
2 1/3/2019 34.0 NaN Rain
3 1/4/2019 36.0 12.0 Sunny
4 1/5/2019 NaN 13.0 Rain

Solution-2

In [590]:
df= pd.read_csv("Stocks.csv")
df
Out[590]:
Tickers EPS Revenue Price People
0 Google 27.82 84 845 larry page
1 WMT 4.61 484 65 n.a.
2 MSFT -1 85 64 bill gates
3 RIL not available 50 1023 mukesh ambani
4 TATA 5.6 -1 n.a ratan tata
In [591]:
df= pd.read_csv("Stocks.csv",na_values=['not available',"n.a","n.a."])
df
Out[591]:
Tickers EPS Revenue Price People
0 Google 27.82 84 845.0 larry page
1 WMT 4.61 484 65.0 NaN
2 MSFT -1.00 85 64.0 bill gates
3 RIL NaN 50 1023.0 mukesh ambani
4 TATA 5.60 -1 NaN ratan tata
  • As we know that Revenue never -ve number
In [592]:
df= pd.read_csv('Stocks.csv',na_values={'EPS':["not available","n.a","n.a."],
                                'Revenue':["not available","n.a","n.a.",-1],
                                        'Price':["not available","n.a","n.a."],
                                        'People':["not available","n.a","n.a."]
                                        })
df
Out[592]:
Tickers EPS Revenue Price People
0 Google 27.82 84.0 845.0 larry page
1 WMT 4.61 484.0 65.0 NaN
2 MSFT -1.00 85.0 64.0 bill gates
3 RIL NaN 50.0 1023.0 mukesh ambani
4 TATA 5.60 NaN NaN ratan tata

Solution-3

converters

In [593]:
df=pd.read_excel("Stocks_ex.xlsx","Stocks")
df
Out[593]:
Tickers EPS Revenue Price People
0 Google 27.82 84 845 larry page
1 WMT 4.61 484 65 n.a.
2 MSFT -1 85 64 bill gates
3 RIL not available 50 1023 mukesh ambani
4 TATA 5.6 -1 n.a ratan tata

Example-1

In [594]:
def convert_people_cell(cell):
    if cell=="n.a.":
        return 'Sam Walton'
    return cell
In [595]:
df=pd.read_excel("Stocks_ex.xlsx","Stocks",converters={
        'People':convert_people_cell
         })
df
Out[595]:
Tickers EPS Revenue Price People
0 Google 27.82 84 845 larry page
1 WMT 4.61 484 65 Sam Walton
2 MSFT -1 85 64 bill gates
3 RIL not available 50 1023 mukesh ambani
4 TATA 5.6 -1 n.a ratan tata

Example-2

In [596]:
def convert_people_cell(cell):
    if cell=="n.a.":
        return 'Sam Walton'
    return cell
def convert_revenue_cell(cell):
    if cell<0:
        return 'NaN'
    return cell
In [597]:
df=pd.read_excel("Stocks_ex.xlsx","Stocks",converters={
        'People':convert_people_cell,
        'Revenue':convert_revenue_cell
        })
df
Out[597]:
Tickers EPS Revenue Price People
0 Google 27.82 84 845 larry page
1 WMT 4.61 484 65 Sam Walton
2 MSFT -1 85 64 bill gates
3 RIL not available 50 1023 mukesh ambani
4 TATA 5.6 NaN n.a ratan tata

Solution-4

Replace

Example-1

In [598]:
df= pd.read_csv('Replace_DS.csv')
df
Out[598]:
Day Temperature Windspeed Event
0 1-Jan-19 32 6 Rain
1 2-Jan-19 -999999 7 Sunny
2 3-Jan-19 28 -999999 Snow
3 4-Jan-19 -999999 7 No Event
4 5-Jan-19 32 -999999 Rain
5 6-Jan-19 31 2 Sunny
6 7-Jan-19 34 5 No Event
In [599]:
df.replace(-999999,np.NaN)
Out[599]:
Day Temperature Windspeed Event
0 1-Jan-19 32.0 6.0 Rain
1 2-Jan-19 NaN 7.0 Sunny
2 3-Jan-19 28.0 NaN Snow
3 4-Jan-19 NaN 7.0 No Event
4 5-Jan-19 32.0 NaN Rain
5 6-Jan-19 31.0 2.0 Sunny
6 7-Jan-19 34.0 5.0 No Event

Example-2

In [600]:
df.replace({
        -999999:np.NaN,
        'No Event':np.NaN})
Out[600]:
Day Temperature Windspeed Event
0 1-Jan-19 32.0 6.0 Rain
1 2-Jan-19 NaN 7.0 Sunny
2 3-Jan-19 28.0 NaN Snow
3 4-Jan-19 NaN 7.0 NaN
4 5-Jan-19 32.0 NaN Rain
5 6-Jan-19 31.0 2.0 Sunny
6 7-Jan-19 34.0 5.0 NaN

Example-3

In [601]:
df= pd.read_csv('Replace_DS1.csv')
df
Out[601]:
Day Temperature Windspeed Event
0 1-Jan-19 32 6 Rain
1 2-Jan-19 -999999 7 Sunny
2 3-Jan-19 28 -999999 Snow
3 4-Jan-19 -999999 7 No Event
4 5-Jan-19 32 -888888 Rain
5 6-Jan-19 31 2 Sunny
6 7-Jan-19 34 5 No Event
In [602]:
df.replace([-999999,-888888,'No Event'],np.NaN)
Out[602]:
Day Temperature Windspeed Event
0 1-Jan-19 32.0 6.0 Rain
1 2-Jan-19 NaN 7.0 Sunny
2 3-Jan-19 28.0 NaN Snow
3 4-Jan-19 NaN 7.0 NaN
4 5-Jan-19 32.0 NaN Rain
5 6-Jan-19 31.0 2.0 Sunny
6 7-Jan-19 34.0 5.0 NaN

Example-4

In [603]:
df= pd.read_csv('Replace_DS2.csv')
df
Out[603]:
Day Temperature Windspeed Event
0 1-Jan-19 32 6 Rain
1 2-Jan-19 -999999 7 Sunny
2 3-Jan-19 28 -888888 Snow
3 4-Jan-19 -999999 7 0
4 5-Jan-19 32 -888888 Rain
5 6-Jan-19 31 2 Sunny
6 7-Jan-19 34 5 0
In [604]:
df.replace({
        'Temperature':-999999,
        'Windspeed':-888888,
        'Event':'0'},np.NaN)
Out[604]:
Day Temperature Windspeed Event
0 1-Jan-19 32.0 6.0 Rain
1 2-Jan-19 NaN 7.0 Sunny
2 3-Jan-19 28.0 NaN Snow
3 4-Jan-19 NaN 7.0 NaN
4 5-Jan-19 32.0 NaN Rain
5 6-Jan-19 31.0 2.0 Sunny
6 7-Jan-19 34.0 5.0 NaN

Example-5

In [605]:
df= pd.read_csv('Replace_DS3.csv')
df
Out[605]:
Day Temperature Windspeed Event
0 1-Jan-19 32F 6MPH Rain
1 2-Jan-19 -999999 7 Sunny
2 3-Jan-19 28C -999999 Snow
3 4-Jan-19 -999999 7 No Event
4 5-Jan-19 32 -999999 Rain
5 6-Jan-19 31 2 Sunny
6 7-Jan-19 34 5MPH No Event
In [606]:
df.replace('[A-Za-z]','',regex=True)
Out[606]:
Day Temperature Windspeed Event
0 1--19 32 6
1 2--19 -999999 7
2 3--19 28 -999999
3 4--19 -999999 7
4 5--19 32 -999999
5 6--19 31 2
6 7--19 34 5
  • Above statment works for Temperature and Windspeed, but not required on Day and Event
In [607]:
df.replace({
        'Temperature':'[A-Za-z]',
        'Windspeed':'[A-Za-z]'},'',regex=True)
Out[607]:
Day Temperature Windspeed Event
0 1-Jan-19 32 6 Rain
1 2-Jan-19 -999999 7 Sunny
2 3-Jan-19 28 -999999 Snow
3 4-Jan-19 -999999 7 No Event
4 5-Jan-19 32 -999999 Rain
5 6-Jan-19 31 2 Sunny
6 7-Jan-19 34 5 No Event

mask

  • More Information
  • Return an object of same shape as self and whose corresponding entries are from self where cond is False and otherwise are from other.

Example-1

In [608]:
df = pd.DataFrame({"A":[12, 4, 5, 44, 1], 
                   "B":[5, 2, 54, 3, 2], 
                   "C":[20, 16, 7, 3, 8], 
                   "D":[14, 3, 17, 2, 6]})
df
Out[608]:
A B C D
0 12 5 20 14
1 4 2 16 3
2 5 54 7 17
3 44 3 3 2
4 1 2 8 6
In [609]:
df.mask(df > 10, -25)
Out[609]:
A B C D
0 -25 5 -25 -25
1 4 2 -25 3
2 5 -25 7 -25
3 -25 3 3 2
4 1 2 8 6

Example-2

In [610]:
df = pd.DataFrame({"A":[12, 4, 5, None, 1], 
                   "B":[7, 2, 54, 3, None], 
                   "C":[20, 16, 11, 3, 8], 
                   "D":[14, 3, None, 2, 6]})
df
Out[610]:
A B C D
0 12.0 7.0 20 14.0
1 4.0 2.0 16 3.0
2 5.0 54.0 11 NaN
3 NaN 3.0 3 2.0
4 1.0 NaN 8 6.0
In [611]:
df.mask(df.isna(), 1000)
Out[611]:
A B C D
0 12.0 7.0 20 14.0
1 4.0 2.0 16 3.0
2 5.0 54.0 11 1000.0
3 1000.0 3.0 3 2.0
4 1.0 1000.0 8 6.0

clip_lower

  • (DEPRECATED) Return copy of the input with values below a threshold truncated.
  • More Information
In [612]:
df=pd.DataFrame({'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]})
df
Out[612]:
col_0 col_1
0 9 -2
1 -3 -7
2 0 6
3 -1 8
4 5 -5
In [613]:
df.clip_lower(3)
Out[613]:
col_0 col_1
0 9 3
1 3 3
2 3 6
3 3 8
4 5 3

clip_upper

  • (DEPRECATED) Return copy of input with values above given value(s) truncated
  • More Information
In [614]:
df=pd.DataFrame({'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]})
df
Out[614]:
col_0 col_1
0 9 -2
1 -3 -7
2 0 6
3 -1 8
4 5 -5
In [615]:
df.clip_upper(3)
Out[615]:
col_0 col_1
0 3 -2
1 -3 -7
2 0 3
3 -1 3
4 3 -5

clip

  • Trim values at input threshold(s).
  • Assigns values outside boundary to boundary values.
  • Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis.
  • More Information
In [616]:
df=pd.DataFrame({'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]})
df
Out[616]:
col_0 col_1
0 9 -2
1 -3 -7
2 0 6
3 -1 8
4 5 -5
In [617]:
df.clip(-4, 6)
Out[617]:
col_0 col_1
0 6 -2
1 -3 -4
2 0 6
3 -1 6
4 5 -4

combine

  • Pandas Series.combine() is a series mathematical operation method.
  • This is used to combine two series into one. The shape of output series is same as the caller series.
  • The elements are decided by a function passed as parameter to combine() method.
  • The shape of both series has to be same otherwise it will throw an error
  • More Information

Exapmple-1

In [618]:
df1 = pd.DataFrame({'A': [0, 2], 'B': [4, 4]})
df1
Out[618]:
A B
0 0 4
1 2 4
In [619]:
df2 = pd.DataFrame({'A': [0, 1], 'B': [8, 8]})
df2
Out[619]:
A B
0 0 8
1 1 8
In [620]:
df1.combine(df2, lambda s1, s2: s1 if s1.sum() < s2.sum() else s2)
Out[620]:
A B
0 0 4
1 1 4

Example-2

In [621]:
df1 =pd.Series([1, 2, 5, 6, 3, 7, 11, 0, 4])
df1
Out[621]:
0     1
1     2
2     5
3     6
4     3
5     7
6    11
7     0
8     4
dtype: int64
In [622]:
df2 =pd.Series([5, 3, 2, 1, 3, 9, 21, 3, 1])
df2
Out[622]:
0     5
1     3
2     2
3     1
4     3
5     9
6    21
7     3
8     1
dtype: int64
In [623]:
df1.combine(df2, (lambda x1, x2: x1 if x1 < x2 else x2))
Out[623]:
0     1
1     2
2     2
3     1
4     3
5     7
6    11
7     0
8     1
dtype: int64

combine_first

  • Two series are created from list using Pandas Series() method.
  • Some Null values are also passed to each list using Numpy np.nan.
  • Both series are then combined using .combine_first() method.
  • At first, the method is called by series1 and result is stored in result1 and then similarly it is called by series2 and stored in result2.
  • Both of the returned series are then printed to compare outputs.
  • More Information
In [624]:
series1 = pd.Series([70, 5, 0, 225, 1, 16, np.nan, 10, np.nan]) 
series2 = pd.Series([27, np.nan, 2, 23, 1, 95, 53, 10, 5])
series3 = series1.combine_first(series2)
series3
Out[624]:
0     70.0
1      5.0
2      0.0
3    225.0
4      1.0
5     16.0
6     53.0
7     10.0
8      5.0
dtype: float64
In [625]:
series4 = series2.combine_first(series1)
series4
Out[625]:
0    27.0
1     5.0
2     2.0
3    23.0
4     1.0
5    95.0
6    53.0
7    10.0
8     5.0
dtype: float64

update

In [626]:
df= pd.DataFrame({'A': ['a', 'b', 'c'],
                    'B': ['x', 'y', 'z']})
df
Out[626]:
A B
0 a x
1 b y
2 c z
In [627]:
df1 = pd.DataFrame({'B': ['d', 'e', 'f', 'g', 'h', 'i']})
df1
Out[627]:
B
0 d
1 e
2 f
3 g
4 h
5 i
In [628]:
df.update(df1)
In [629]:
df
Out[629]:
A B
0 a d
1 b e
2 c f

11.25 Varibale types handling

In [630]:
df= pd.read_csv('List.csv')
df
Out[630]:
Score Student
0 Exceptional Irfan
1 V Good Ali
2 Good Ahmed
3 Poor Kamran
4 V Good Atif
5 Exceptional Mansoor
In [631]:
df.sort_values("Score")
Out[631]:
Score Student
0 Exceptional Irfan
5 Exceptional Mansoor
2 Good Ahmed
3 Poor Kamran
1 V Good Ali
4 V Good Atif
In [632]:
from pandas.api.types import CategoricalDtype
df['Score'] = df['Score'].astype(CategoricalDtype(categories=['Poor','Good','V Good','Exceptional'],ordered=True))
df
Out[632]:
Score Student
0 Exceptional Irfan
1 V Good Ali
2 Good Ahmed
3 Poor Kamran
4 V Good Atif
5 Exceptional Mansoor
In [633]:
df.sort_values("Score")
Out[633]:
Score Student
3 Poor Kamran
2 Good Ahmed
1 V Good Ali
4 V Good Atif
0 Exceptional Irfan
5 Exceptional Mansoor
In [634]:
df.loc[df.Score >= 'Good']
Out[634]:
Score Student
0 Exceptional Irfan
1 V Good Ali
2 Good Ahmed
4 V Good Atif
5 Exceptional Mansoor

get_dummies

  • If we wanted to separate the distinct variables out into booleans as we would like for data science models such as for example, linear regression, we can use pd.get_dummies.
In [635]:
from IPython.display import YouTubeVideo
YouTubeVideo('0s_1IsROgDc',width=900, height=500)
Out[635]:
In [636]:
pd.get_dummies(df, columns=['Score'])
Out[636]:
Student Score_Poor Score_Good Score_V Good Score_Exceptional
0 Irfan 0 0 0 1
1 Ali 0 0 1 0
2 Ahmed 0 1 0 0
3 Kamran 1 0 0 0
4 Atif 0 0 1 0
5 Mansoor 0 0 0 1

Nominal Categories

In [637]:
df= pd.read_csv('Gender.csv')
df
Out[637]:
Name Gender
0 Umer Saeed Male
1 Ali Saeed Male
2 Ahmed Saeed Male
3 Sarah Saeed Female
4 Khansa Saeed Female
In [638]:
df['Sex']=df.Gender.astype("category").cat.codes
df
Out[638]:
Name Gender Sex
0 Umer Saeed Male 1
1 Ali Saeed Male 1
2 Ahmed Saeed Male 1
3 Sarah Saeed Female 0
4 Khansa Saeed Female 0

map

In [639]:
df= pd.read_csv('Gender.csv')
df
Out[639]:
Name Gender
0 Umer Saeed Male
1 Ali Saeed Male
2 Ahmed Saeed Male
3 Sarah Saeed Female
4 Khansa Saeed Female
In [640]:
df['Sex_num'] = df.Gender.map({'Female':0, 'Male':1})
df
Out[640]:
Name Gender Sex_num
0 Umer Saeed Male 1
1 Ali Saeed Male 1
2 Ahmed Saeed Male 1
3 Sarah Saeed Female 0
4 Khansa Saeed Female 0

Example

In [641]:
df_who= pd.read_csv('WHO_csv.csv')
In [642]:
df_who.memory_usage()
Out[642]:
Index                             128
Country                          1552
Region                           1552
Population                       1552
Under15                          1552
Over60                           1552
FertilityRate                    1552
LifeExpectancy                   1552
ChildMortality                   1552
CellularSubscribers              1552
LiteracyRate                     1552
GNI                              1552
PrimarySchoolEnrollmentMale      1552
PrimarySchoolEnrollmentFemale    1552
dtype: int64
In [643]:
df_who= pd.read_csv('WHO_csv.csv',dtype = {"Region" : "category"})
In [644]:
df_who.dtypes
Out[644]:
Country                            object
Region                           category
Population                          int64
Under15                           float64
Over60                            float64
FertilityRate                     float64
LifeExpectancy                      int64
ChildMortality                    float64
CellularSubscribers               float64
LiteracyRate                      float64
GNI                               float64
PrimarySchoolEnrollmentMale       float64
PrimarySchoolEnrollmentFemale     float64
dtype: object
In [645]:
df_who.memory_usage()
Out[645]:
Index                             128
Country                          1552
Region                            402
Population                       1552
Under15                          1552
Over60                           1552
FertilityRate                    1552
LifeExpectancy                   1552
ChildMortality                   1552
CellularSubscribers              1552
LiteracyRate                     1552
GNI                              1552
PrimarySchoolEnrollmentMale      1552
PrimarySchoolEnrollmentFemale    1552
dtype: int64

Example

In [646]:
df_who= pd.read_csv('WHO_csv.csv',dtype = {"Region" : "category",'Country':'category'})
In [647]:
df_who.dtypes
Out[647]:
Country                          category
Region                           category
Population                          int64
Under15                           float64
Over60                            float64
FertilityRate                     float64
LifeExpectancy                      int64
ChildMortality                    float64
CellularSubscribers               float64
LiteracyRate                      float64
GNI                               float64
PrimarySchoolEnrollmentMale       float64
PrimarySchoolEnrollmentFemale     float64
dtype: object
In [648]:
df_who.memory_usage()
Out[648]:
Index                             128
Country                          7060
Region                            402
Population                       1552
Under15                          1552
Over60                           1552
FertilityRate                    1552
LifeExpectancy                   1552
ChildMortality                   1552
CellularSubscribers              1552
LiteracyRate                     1552
GNI                              1552
PrimarySchoolEnrollmentMale      1552
PrimarySchoolEnrollmentFemale    1552
dtype: int64

Convert continuous data into categorical data

In [649]:
df= pd.read_csv('titanic.csv')
df.head()
Out[649]:
PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1246 3 Dean, Miss. Elizabeth Gladys Millvina"" female 0.17 1 2 C.A. 2315 20.575 NaN S
1 1093 3 Danbom, Master. Gilbert Sigvard Emanuel male 0.33 0 2 347080 14.400 NaN S
2 1173 3 Peacock, Master. Alfred Edward male 0.75 1 1 SOTON/O.Q. 3101315 13.775 NaN S
3 1199 3 Aks, Master. Philip Frank male 0.83 0 1 392091 9.350 NaN S
4 1142 2 West, Miss. Barbara J female 0.92 1 2 C.A. 34651 27.750 NaN S
  • It's currently continuous data, but what if you wanted to convert it into categorical data.
  • One solution would be to label the age ranges, such as "child", "young adult", and "adult". The best way to do this is by using the cut() function:
In [650]:
df['cage']=pd.cut(df.Age, bins=[0, 18, 25, 99], labels=['child', 'young adult', 'adult'])
In [651]:
df.head()
Out[651]:
PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked cage
0 1246 3 Dean, Miss. Elizabeth Gladys Millvina"" female 0.17 1 2 C.A. 2315 20.575 NaN S child
1 1093 3 Danbom, Master. Gilbert Sigvard Emanuel male 0.33 0 2 347080 14.400 NaN S child
2 1173 3 Peacock, Master. Alfred Edward male 0.75 1 1 SOTON/O.Q. 3101315 13.775 NaN S child
3 1199 3 Aks, Master. Philip Frank male 0.83 0 1 392091 9.350 NaN S child
4 1142 2 West, Miss. Barbara J female 0.92 1 2 C.A. 34651 27.750 NaN S child
In [652]:
df.tail()
Out[652]:
PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked cage
327 1071 1 Compton, Mrs. Alexander Taylor (Mary Eliza Ing... female 64.0 0 2 PC 17756 83.1583 E45 C adult
328 1128 1 Warren, Mr. Frank Manley male 64.0 1 0 110813 75.2500 D37 C adult
329 1197 1 Crosby, Mrs. Edward Gifford (Catherine Elizabe... female 64.0 1 1 112901 26.5500 B26 S adult
330 973 1 Straus, Mr. Isidor male 67.0 1 0 PC 17483 221.7792 C55 C57 S adult
331 988 1 Cavendish, Mrs. Tyrell William (Julia Florence... female 76.0 1 0 19877 78.8500 C46 S adult
In [653]:
df['cage'].dtypes
Out[653]:
CategoricalDtype(categories=['child', 'young adult', 'adult'], ordered=True)
  • This assigned each value to a bin with a label.
    • Ages 0 to 18 were assigned the label "child",
    • ages 18 to 25 were assigned the label "young adult",
    • and ages 25 to 99 were assigned the label "adult".
  • Notice that the data type is now "category", and the categories are automatically ordered.

qcut

  • More Information
  • Quantile-based discretization function.
  • Discretize variable into equal-sized buckets based on rank or based on sample quantiles.
  • For example 1000 values for 10 quantiles would produce a Categorical object indicating quantile membership for each data point.
In [654]:
df= pd.qcut(range(5), 4)
df
Out[654]:
[(-0.001, 1.0], (-0.001, 1.0], (1.0, 2.0], (2.0, 3.0], (3.0, 4.0]]
Categories (4, interval[float64]): [(-0.001, 1.0] < (1.0, 2.0] < (2.0, 3.0] < (3.0, 4.0]]
In [655]:
df=pd.qcut(range(5), 3, labels=["good", "medium", "bad"])
df
Out[655]:
[good, good, medium, bad, bad]
Categories (3, object): [good < medium < bad]
In [656]:
df=pd.qcut(range(5), 4, labels=False)
df
Out[656]:
array([0, 0, 1, 2, 3], dtype=int64)

11.26 CONCATENATE

In [657]:
from IPython.display import YouTubeVideo
YouTubeVideo('WGOEFok1szA',width=900, height=500)
Out[657]:

CONCATENATE w.r.t. rows

In [658]:
df1= pd.read_csv('KPK_Weather.csv')
df1
Out[658]:
City Day Event Temperature Windspeed
0 Peshawar 1/1/2019 Rain 38 16
1 Abbottabad 1/4/2019 Sunny 46 20
2 Kohat 1/3/2019 Sunny 42 19
3 Dir 1/2/2019 Rain 40 17
In [659]:
df2= pd.read_csv('Punjab_Weather.csv')
df2
Out[659]:
City Day Event Temperature Windspeed
0 Lahore 1/1/2019 Rain 30 6
1 Multan 1/2/2019 Sunny 32 7
2 Faisalabad 1/3/2019 Rain 34 10
3 Jhelum 1/4/2019 Sunny 36 12
In [660]:
df=pd.concat([df1,df2])
df
Out[660]:
City Day Event Temperature Windspeed
0 Peshawar 1/1/2019 Rain 38 16
1 Abbottabad 1/4/2019 Sunny 46 20
2 Kohat 1/3/2019 Sunny 42 19
3 Dir 1/2/2019 Rain 40 17
0 Lahore 1/1/2019 Rain 30 6
1 Multan 1/2/2019 Sunny 32 7
2 Faisalabad 1/3/2019 Rain 34 10
3 Jhelum 1/4/2019 Sunny 36 12

Solution for Index

In [661]:
df=pd.concat([df1,df2],ignore_index=True)
df
Out[661]:
City Day Event Temperature Windspeed
0 Peshawar 1/1/2019 Rain 38 16
1 Abbottabad 1/4/2019 Sunny 46 20
2 Kohat 1/3/2019 Sunny 42 19
3 Dir 1/2/2019 Rain 40 17
4 Lahore 1/1/2019 Rain 30 6
5 Multan 1/2/2019 Sunny 32 7
6 Faisalabad 1/3/2019 Rain 34 10
7 Jhelum 1/4/2019 Sunny 36 12

keys

In [662]:
df=pd.concat([df1,df2],keys=['KPK','Punjab'])
df
Out[662]:
City Day Event Temperature Windspeed
KPK 0 Peshawar 1/1/2019 Rain 38 16
1 Abbottabad 1/4/2019 Sunny 46 20
2 Kohat 1/3/2019 Sunny 42 19
3 Dir 1/2/2019 Rain 40 17
Punjab 0 Lahore 1/1/2019 Rain 30 6
1 Multan 1/2/2019 Sunny 32 7
2 Faisalabad 1/3/2019 Rain 34 10
3 Jhelum 1/4/2019 Sunny 36 12
In [663]:
df.loc['Punjab']
Out[663]:
City Day Event Temperature Windspeed
0 Lahore 1/1/2019 Rain 30 6
1 Multan 1/2/2019 Sunny 32 7
2 Faisalabad 1/3/2019 Rain 34 10
3 Jhelum 1/4/2019 Sunny 36 12
In [664]:
df.loc['KPK']
Out[664]:
City Day Event Temperature Windspeed
0 Peshawar 1/1/2019 Rain 38 16
1 Abbottabad 1/4/2019 Sunny 46 20
2 Kohat 1/3/2019 Sunny 42 19
3 Dir 1/2/2019 Rain 40 17

CONCATENATE w.r.t col

Exammple-1

In [665]:
df1= pd.read_csv('WTemData.csv')
df1
Out[665]:
City Temperature
0 Lahore 30
1 Multan 32
2 Faisalabad 34
3 Jhelum 36
In [666]:
df2= pd.read_csv('WWSData.csv')
df2
Out[666]:
City WindSpeed
0 Lahore 7
1 Multan 8
2 Faisalabad 9
3 Jhelum 10
In [667]:
df=pd.concat([df1,df2])
df
Out[667]:
City Temperature WindSpeed
0 Lahore 30.0 NaN
1 Multan 32.0 NaN
2 Faisalabad 34.0 NaN
3 Jhelum 36.0 NaN
0 Lahore NaN 7.0
1 Multan NaN 8.0
2 Faisalabad NaN 9.0
3 Jhelum NaN 10.0

Solution

In [668]:
df=pd.concat([df1,df2],axis=1)
df
Out[668]:
City Temperature City WindSpeed
0 Lahore 30 Lahore 7
1 Multan 32 Multan 8
2 Faisalabad 34 Faisalabad 9
3 Jhelum 36 Jhelum 10

Example-2

In [669]:
df1=pd.DataFrame({
'City':['LHR','SHD','GJW','SKT'],
'temperature':[30,32,34,36]})
df1
Out[669]:
City temperature
0 LHR 30
1 SHD 32
2 GJW 34
3 SKT 36
In [670]:
df2=pd.DataFrame({
'City':['SKT','GJW','SHD','LHR'],
'windspeed':[7,8,9,10]})
df2
Out[670]:
City windspeed
0 SKT 7
1 GJW 8
2 SHD 9
3 LHR 10
In [671]:
df=pd.concat([df1,df2],axis=1)
df
Out[671]:
City temperature City windspeed
0 LHR 30 SKT 7
1 SHD 32 GJW 8
2 GJW 34 SHD 9
3 SKT 36 LHR 10

Solution

In [672]:
df1=pd.DataFrame({
'City':['LHR','SHD','GJW','SKT'],
'temperature':[30,32,34,36]},index=[0,1,2,3])
df1
Out[672]:
City temperature
0 LHR 30
1 SHD 32
2 GJW 34
3 SKT 36
In [673]:
df2=pd.DataFrame({
'City':['SKT','GJW','SHD','LHR'],
'windspeed':[7,8,9,10]},index=[3,2,1,0])
df2
Out[673]:
City windspeed
3 SKT 7
2 GJW 8
1 SHD 9
0 LHR 10
In [674]:
df=pd.concat([df1,df2],axis=1)
df
Out[674]:
City temperature City windspeed
0 LHR 30 LHR 10
1 SHD 32 SHD 9
2 GJW 34 GJW 8
3 SKT 36 SKT 7

11.27 Merge Dataframes

In [675]:
from IPython.display import YouTubeVideo
YouTubeVideo('h4hOPGo4UVU',width=900, height=500)
Out[675]:
  • Join columns with other DataFrame either on index or on a key column.
  • Efficiently Join multiple DataFrame objects by index at once by passing a list.
  • More Information1
  • More Information2

Example-1

In [676]:
df1=pd.DataFrame({
'City':['LHR','SHD','GJW','SKT'],
'temperature':[30,32,34,36]})
df1
Out[676]:
City temperature
0 LHR 30
1 SHD 32
2 GJW 34
3 SKT 36
In [677]:
df2=pd.DataFrame({
'City':['SKT','GJW','SHD','LHR'],
'windspeed':[7,8,9,10]})
df2
Out[677]:
City windspeed
0 SKT 7
1 GJW 8
2 SHD 9
3 LHR 10
In [678]:
df=pd.merge(df1,df2,on="City")
df
Out[678]:
City temperature windspeed
0 LHR 30 10
1 SHD 32 9
2 GJW 34 8
3 SKT 36 7

Example-2

In [679]:
df1= pd.read_csv('Family.csv',skipfooter=1,engine='python')
df1
Out[679]:
Name ID Program
0 Umer Saeed F2017313014 MS(DS)
1 Ali Saeed F2017313016 BBA
2 Ahmed Saeed F2017313018 MS(CS)
In [680]:
df2= pd.read_csv('Math.txt',sep=",")
df2
Out[680]:
ID Math
0 F2017313014 98
1 F2017313016 96
2 F2017313018 94
In [681]:
df=pd.merge(df1,df2,on="ID")
df
Out[681]:
Name ID Program Math
0 Umer Saeed F2017313014 MS(DS) 98
1 Ali Saeed F2017313016 BBA 96
2 Ahmed Saeed F2017313018 MS(CS) 94

Example-3

In [682]:
df1= pd.read_csv('ListStu.txt',sep=",")
df1
Out[682]:
ID Name Program
0 F2017313014 Umer Saeed MS(DS)
1 F2017313015 Ali Saeed BBA
2 F2017313016 Ahmed Abdullah Saeed BS(CS)
3 F2017313017 Bilal Iqbal MS(TE)
4 F2017313018 Irfan Kareem MS(CS)
5 F2017313019 Muhammad Ijlal Khan BS(CE)
In [683]:
df2= pd.read_csv('Listage.txt',sep=",")
df2
Out[683]:
ID age
0 F2017313014 37
1 F2017313016 28
2 F2017313017 25
3 F2017313019 28
In [684]:
df=pd.merge(df1,df2,on="ID")
df
Out[684]:
ID Name Program age
0 F2017313014 Umer Saeed MS(DS) 37
1 F2017313016 Ahmed Abdullah Saeed BS(CS) 28
2 F2017313017 Bilal Iqbal MS(TE) 25
3 F2017313019 Muhammad Ijlal Khan BS(CE) 28

Example-4

In [685]:
df=pd.merge(df1,df2,on="ID",how="inner")
df
Out[685]:
ID Name Program age
0 F2017313014 Umer Saeed MS(DS) 37
1 F2017313016 Ahmed Abdullah Saeed BS(CS) 28
2 F2017313017 Bilal Iqbal MS(TE) 25
3 F2017313019 Muhammad Ijlal Khan BS(CE) 28

Outer Join

In [686]:
df1= pd.read_csv('ListStu.txt',sep=",")
df1
Out[686]:
ID Name Program
0 F2017313014 Umer Saeed MS(DS)
1 F2017313015 Ali Saeed BBA
2 F2017313016 Ahmed Abdullah Saeed BS(CS)
3 F2017313017 Bilal Iqbal MS(TE)
4 F2017313018 Irfan Kareem MS(CS)
5 F2017313019 Muhammad Ijlal Khan BS(CE)
In [687]:
df2= pd.read_csv('Listage.txt',sep=",")
df2
Out[687]:
ID age
0 F2017313014 37
1 F2017313016 28
2 F2017313017 25
3 F2017313019 28
In [688]:
df=pd.merge(df1,df2,on="ID",how="outer")
df
Out[688]:
ID Name Program age
0 F2017313014 Umer Saeed MS(DS) 37.0
1 F2017313015 Ali Saeed BBA NaN
2 F2017313016 Ahmed Abdullah Saeed BS(CS) 28.0
3 F2017313017 Bilal Iqbal MS(TE) 25.0
4 F2017313018 Irfan Kareem MS(CS) NaN
5 F2017313019 Muhammad Ijlal Khan BS(CE) 28.0

Left Join

In [689]:
df1= pd.read_csv('ListStu.txt',sep=",")
df1
Out[689]:
ID Name Program
0 F2017313014 Umer Saeed MS(DS)
1 F2017313015 Ali Saeed BBA
2 F2017313016 Ahmed Abdullah Saeed BS(CS)
3 F2017313017 Bilal Iqbal MS(TE)
4 F2017313018 Irfan Kareem MS(CS)
5 F2017313019 Muhammad Ijlal Khan BS(CE)
In [690]:
df2= pd.read_csv('Listage.txt',sep=",")
df2
Out[690]:
ID age
0 F2017313014 37
1 F2017313016 28
2 F2017313017 25
3 F2017313019 28
In [691]:
df=pd.merge(df1,df2,on="ID",how="left")
df
Out[691]:
ID Name Program age
0 F2017313014 Umer Saeed MS(DS) 37.0
1 F2017313015 Ali Saeed BBA NaN
2 F2017313016 Ahmed Abdullah Saeed BS(CS) 28.0
3 F2017313017 Bilal Iqbal MS(TE) 25.0
4 F2017313018 Irfan Kareem MS(CS) NaN
5 F2017313019 Muhammad Ijlal Khan BS(CE) 28.0

Right Join

In [ ]:
 

In [692]:
df1= pd.read_csv('ListStu.txt',sep=",")
df1
Out[692]:
ID Name Program
0 F2017313014 Umer Saeed MS(DS)
1 F2017313015 Ali Saeed BBA
2 F2017313016 Ahmed Abdullah Saeed BS(CS)
3 F2017313017 Bilal Iqbal MS(TE)
4 F2017313018 Irfan Kareem MS(CS)
5 F2017313019 Muhammad Ijlal Khan BS(CE)
In [693]:
df2= pd.read_csv('Listage.txt',sep=",")
df2
Out[693]:
ID age
0 F2017313014 37
1 F2017313016 28
2 F2017313017 25
3 F2017313019 28
In [694]:
df=pd.merge(df1,df2,on="ID",how="right")
df
Out[694]:
ID Name Program age
0 F2017313014 Umer Saeed MS(DS) 37
1 F2017313016 Ahmed Abdullah Saeed BS(CS) 28
2 F2017313017 Bilal Iqbal MS(TE) 25
3 F2017313019 Muhammad Ijlal Khan BS(CE) 28

Indicator

In [695]:
import pandas as pd
df1= pd.read_csv('ListStu.txt',sep=",")
df1
Out[695]:
ID Name Program
0 F2017313014 Umer Saeed MS(DS)
1 F2017313015 Ali Saeed BBA
2 F2017313016 Ahmed Abdullah Saeed BS(CS)
3 F2017313017 Bilal Iqbal MS(TE)
4 F2017313018 Irfan Kareem MS(CS)
5 F2017313019 Muhammad Ijlal Khan BS(CE)
In [696]:
df2= pd.read_csv('Listage.txt',sep=",")
df2
Out[696]:
ID age
0 F2017313014 37
1 F2017313016 28
2 F2017313017 25
3 F2017313019 28
In [697]:
df=pd.merge(df1,df2,on="ID",how="outer",indicator=True)
df
Out[697]:
ID Name Program age _merge
0 F2017313014 Umer Saeed MS(DS) 37.0 both
1 F2017313015 Ali Saeed BBA NaN left_only
2 F2017313016 Ahmed Abdullah Saeed BS(CS) 28.0 both
3 F2017313017 Bilal Iqbal MS(TE) 25.0 both
4 F2017313018 Irfan Kareem MS(CS) NaN left_only
5 F2017313019 Muhammad Ijlal Khan BS(CE) 28.0 both

Suffixes

Example-1

In [698]:
df1=pd.DataFrame({
        'ID':['F2017313014','F2017313015'],
        'Marks':[98,97]
                })
df1
Out[698]:
ID Marks
0 F2017313014 98
1 F2017313015 97
In [699]:
df2=pd.DataFrame({
        'ID':['F2017313014','F2017313015'],
        'Marks':[96,95]
        })
df2
Out[699]:
ID Marks
0 F2017313014 96
1 F2017313015 95
In [700]:
df=pd.merge(df1,df2,on="ID")
df
Out[700]:
ID Marks_x Marks_y
0 F2017313014 98 96
1 F2017313015 97 95

Exaample-2

In [701]:
df1=pd.DataFrame({
        'ID':['F2017313014','F2017313015'],
        'Marks':[98,97]
                })
df1
Out[701]:
ID Marks
0 F2017313014 98
1 F2017313015 97
In [702]:
df2=pd.DataFrame({
        'ID':['F2017313014','F2017313015'],
        'Marks':[96,95]
        })
df2
Out[702]:
ID Marks
0 F2017313014 96
1 F2017313015 95
In [703]:
df=pd.merge(df1,df2,on="ID",suffixes=('_Math','_Physics'))
df
Out[703]:
ID Marks_Math Marks_Physics
0 F2017313014 98 96
1 F2017313015 97 95

Example-3

In [704]:
df1=pd.DataFrame({
        'ID':['F2017313014','F2017313015'],
        'Marks':[98,97],
        'Name':['Umer','Ali']
                })
df1
Out[704]:
ID Marks Name
0 F2017313014 98 Umer
1 F2017313015 97 Ali
In [705]:
df2=pd.DataFrame({
        'ID':['F2017313014','F2017313015'],
        'Marks':[96,95],
        'Name':['Amir','Yaqoob']
        })
df2
Out[705]:
ID Marks Name
0 F2017313014 96 Amir
1 F2017313015 95 Yaqoob
In [706]:
df=pd.merge(df1,df2,on="ID",suffixes=('_left','_right'))
df
Out[706]:
ID Marks_left Name_left Marks_right Name_right
0 F2017313014 98 Umer 96 Amir
1 F2017313015 97 Ali 95 Yaqoob

merge_ordered

  • Perform merge with optional filling/interpolation designed for ordered data like time series data.
  • Optionally perform group-wise merge (see examples)
In [707]:
df1 = pd.DataFrame({'key': list('aceace'), 'group': list('aaabbb'), 'lvalue': [1, 2, 3, 1, 2, 3]})[[ 'group','key', 'lvalue']]
df1
Out[707]:
group key lvalue
0 a a 1
1 a c 2
2 a e 3
3 b a 1
4 b c 2
5 b e 3
In [708]:
df2 = pd.DataFrame({'key': list('bcd'), 'rvalue': [1, 2, 3]})[[ 'rvalue','key']]
df2
Out[708]:
rvalue key
0 1 b
1 2 c
2 3 d
In [709]:
pd.merge_ordered(df1, df2, left_by='group')
Out[709]:
group key lvalue rvalue
0 a a 1.0 NaN
1 a b NaN 1.0
2 a c 2.0 2.0
3 a d NaN 3.0
4 a e 3.0 NaN
5 b a 1.0 NaN
6 b b NaN 1.0
7 b c 2.0 2.0
8 b d NaN 3.0
9 b e 3.0 NaN
In [710]:
pd.merge_ordered(df1, df2, left_by='group',fill_method='ffill')
Out[710]:
group key lvalue rvalue
0 a a 1 NaN
1 a b 1 1.0
2 a c 2 2.0
3 a d 2 3.0
4 a e 3 3.0
5 b a 1 NaN
6 b b 1 1.0
7 b c 2 2.0
8 b d 2 3.0
9 b e 3 3.0

merge_asof

  • Perform an asof merge. This is similar to a left-join except that we match on nearest key rather than equal keys.
  • Both DataFrames must be sorted by the key
  • More Information
In [711]:
df1 = pd.DataFrame({'a': [1, 5, 10], 'left_val': ['a', 'b', 'c']})
df1
Out[711]:
a left_val
0 1 a
1 5 b
2 10 c
In [712]:
df2 = pd.DataFrame({'a': [1, 2, 3, 6, 7],
                      'right_val': [1, 2, 3, 6, 7]})
df2
Out[712]:
a right_val
0 1 1
1 2 2
2 3 3
3 6 6
4 7 7
In [713]:
pd.merge_asof(df1, df2, on='a')
Out[713]:
a left_val right_val
0 1 a 1
1 5 b 3
2 10 c 7
In [714]:
pd.merge_asof(df1, df2, on='a', allow_exact_matches=True)
Out[714]:
a left_val right_val
0 1 a 1
1 5 b 3
2 10 c 7
In [715]:
pd.merge_asof(df1, df2, on='a', allow_exact_matches=False)
Out[715]:
a left_val right_val
0 1 a NaN
1 5 b 3.0
2 10 c 7.0

lookup

In [716]:
df1 = pd.DataFrame([[1,2,3,4], [6,7,8,9]], columns=['D', 'B', 'E', 'A'])
df1
Out[716]:
D B E A
0 1 2 3 4
1 6 7 8 9
In [717]:
df2 = pd.DataFrame([[10,20,30,40], [60,70,80,90], [600,700,800,900]], columns=['A', 'B', 'C', 'D'])
df2
Out[717]:
A B C D
0 10 20 30 40
1 60 70 80 90
2 600 700 800 900
In [718]:
a1,a2=df1.align(df2, join='inner', axis=1)
In [719]:
a1
Out[719]:
D B A
0 1 2 4
1 6 7 9
In [720]:
a2
Out[720]:
D B A
0 40 20 10
1 90 70 60
2 900 700 600

append

  • More Information
  • Append rows of other to the end of this frame, returning a new object.
In [721]:
a1.append(a2)
Out[721]:
D B A
0 1 2 4
1 6 7 9
0 40 20 10
1 90 70 60
2 900 700 600
In [722]:
a1.append(a2,ignore_index=True)
Out[722]:
D B A
0 1 2 4
1 6 7 9
2 40 20 10
3 90 70 60
4 900 700 600

Add column to pandas dataframe with partial file name while importing many files

In [723]:
import os
#import glob
from glob import glob
#all_files = glob.glob('Scopus*.csv')
all_files = glob('Scopus*.csv')
df_from_each_file = (pd.read_csv(f,encoding='latin-1').assign(fname=os.path.basename(f).split('.')[0]) for f in all_files)
concatdf = pd.concat(df_from_each_file, ignore_index=True)
concatdf[['Title','fname']].head()
Out[723]:
Title fname
0 Art. XIV.—Translation of a Bactrian Pali Inscr... Scopus_1870_1898
1 Concretionary Structure in Plaster Scopus_1870_1898
2 Dr. Feistmantel's paper on the gondwana series Scopus_1870_1898
3 What is an Erratic? Scopus_1870_1898
4 Artificial earthquakes [4] Scopus_1870_1898

Build a DataFrame from multiple files

w.r.t rows

In [724]:
scopus_files = sorted(glob('Scopus*.csv'))
scopus_files
Out[724]:
['Scopus_1870_1898.csv', 'Scopus_1901-1925.csv', 'Scopus_1926-1950.csv']
  • glob returns filenames in an arbitrary order, which is why we sorted the list using Python's built-in sorted() function.
  • We can then use a generator expression to read each of the files using read_csv() and pass the results to the concat() function, which will concatenate the rows into a single DataFrame:
In [725]:
pd.concat((pd.read_csv(file,encoding='latin-1') for file in scopus_files))
Out[725]:
Authors Author(s)ID Title Year Sourcetitle Volume Issue Art.No. Pagestart Pageend Pagecount Citedby DOI Link DocumentType PublicationStage AccessType Source EID
0 Dowson, J. 57189667124 Art. XIV.—Translation of a Bactrian Pali Inscr... 1870 Journal of the Royal Asiatic Society of Great ... 4 2 NaN 497.0 502.0 NaN NaN 10.1017/S0035869X00016075 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84973958732
1 Benwyan 57124066500 Concretionary Structure in Plaster 1871 Geological Magazine 8 85 NaN 333.0 334.0 NaN NaN 10.1017/S0016756800161588 https://www.scopus.com/inward/record.url?eid=2... Letter Final Open Access Scopus 2-s2.0-84958474637
2 Blanford, W.T. 48861017900 Dr. Feistmantel's paper on the gondwana series 1877 Geological Magazine 4 4 NaN 189.0 190.0 NaN NaN 10.1017/S0016756800149064 https://www.scopus.com/inward/record.url?eid=2... Letter Final Open Access Scopus 2-s2.0-84974085864
3 Wynne, A.B. 49662501200 What is an Erratic? 1878 Geological Magazine 5 4 NaN 185.0 187.0 NaN NaN 10.1017/S0016756800146631 https://www.scopus.com/inward/record.url?eid=2... Letter Final Open Access Scopus 2-s2.0-84958479816
4 Lewis, T.C. 24768550000 Artificial earthquakes [4] 1885 Nature 32 822 NaN 295.0 NaN NaN NaN 10.1038/032295a0 https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-51149163204
5 Fairland, E., Day, W.H. 57189112924; 57189113458 Therapeutic memoranda 1886 British Medical Journal 2 1344 NaN 629.0 NaN NaN NaN 10.1136/bmj.2.1344.629 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84965281283
6 John Hope, W.H.S. 36576681800 Tangier 1886 Notes and Queries s7-I 3 NaN 56.0 NaN NaN NaN 10.1093/nq/s7-I.3.56-g https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-77958214022
7 Constable, F.C. 24768594000 Birds and mirrors [8] 1886 Nature 34 865 NaN 76.0 NaN NaN NaN 10.1038/034076g0 https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-51149142548
8 Evatt, G. 36528517700 The title of the qualification of the society ... 1887 British Medical Journal 2 1391 NaN 483.0 NaN NaN NaN 10.1136/bmj.2.1391.483-a https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84965360789
9 Prichard, A. 57189111145 The present stock of humanised lymph 1887 British Medical Journal 2 1408 NaN 1402.0 NaN NaN NaN 10.1136/bmj.2.1408.1402-a https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84965202901
10 Eustace, M. 57189117009 “horror” and imperfect chloroform anæsthesia 1893 British Medical Journal 2 1710 NaN 814.0 815.0 NaN NaN 10.1136/bmj.2.1710.814-d https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84965224516
11 Williamis, W.R. 57189118768 Gall stones and cancer 1893 British Medical Journal 2 1704 NaN 490.0 NaN NaN NaN 10.1136/bmj.2.1704.490-b https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84965224100
12 Jackson, M.J. 55456067900 Printing mathematics [7] 1893 Nature 47 1210 NaN 227.0 NaN NaN NaN 10.1038/047227c0 https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-51149140956
13 Eustace, M. 57189117009 A case of hydatids of femur in the site of fra... 1894 British Medical Journal 1 1743 NaN 1124.0 1125.0 NaN NaN 10.1136/bmj.1.1743.1124-a https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84965227534
14 Battebsby, J. 57189157908 Chitral relief force 1895 British Medical Journal 2 1824 NaN 1526.0 NaN NaN NaN 10.1136/bmj.2.1824.1526-a https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84965349699
15 Toogood, F.S. 57189152489 Reports on medical &amp; surgical practice in ... 1896 British Medical Journal 1 1845 NaN 1144.0 1145.0 NaN NaN 10.1136/bmj.1.1845.1144 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84965321899
16 Eustace, M. 57189117009 Club rates: A suggestion 1896 British Medical Journal 1 1828 NaN 116.0 NaN NaN NaN 10.1136/bmj.1.1828.116-b https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84965220641
17 Grant, D.G.F. 24769006300 Barisal guns [2] 1896 Nature 53 1366 NaN 197.0 NaN NaN NaN 10.1038/053197a0 https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-51149156804
18 Nunn, J.A. 57189120428 Sunstroke in animals 1898 British Medical Journal 1 1943 NaN 862.0 NaN NaN 1.0 10.1136/bmj.1.1943.862-a https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84942166966
0 Battersby, J. 57189430325 The recent epidemic of typhoid fever in south ... 1901 British Medical Journal 1 2111 NaN 1521.0 NaN NaN NaN 10.1136/bmj.1.2111.1521-a https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84970814279
1 Thomson, F. 57189118610 “Oral sepsis” in scarlet fever 1906 British Medical Journal 1 2357 NaN 534.0 535.0 NaN NaN 10.1136/bmj.1.2357.534-a https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84965219744
2 Brid, W.D. 57067161300 A Company of Russian Machine Guns at the Battl... 1906 Royal United Services Institution. Journal 50 346 NaN 1498.0 1503.0 NaN NaN 10.1080/03071840609431330 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84954899083
3 Jones, B.M. 23076847500 CLXIX. - The spontaneous crystallisation of so... 1908 Journal of the Chemical Society, Transactions 93 NaN NaN 1739.0 1747.0 NaN 2.0 10.1039/CT9089301739 https://www.scopus.com/inward/record.url?eid=2... Review Final NaN Scopus 2-s2.0-37049152660
4 Hemmy, A.S. 36964532600 The earth and comets' tails [5] 1910 Nature 83 2120 NaN 459.0 NaN NaN NaN NaN https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-51149133856
5 McCarrison, R. 23077533900 Further experimental researches on the etiolog... 1911 Annals of Tropical Medicine and Parasitology 5 1 NaN 1.0 14.0 NaN NaN 10.1080/00034983.1911.11686338 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84975010534
6 Harris, G.C. 57066128000 Spun silk sewing thread counts 1911 Journal of the Textile Institute Proceedings a... 2 2 NaN 160.0 NaN NaN NaN 10.1080/00405001108631719 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84954867566
7 Bomford, T. 56907822900 THE RIGHT ANGLE OF APPROACH 1911 The Muslim World 1 3 NaN 283.0 288.0 NaN NaN 10.1111/j.1478-1913.1911.tb00034.x https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84944486888
8 Weitbrecht, H.U. 56907875600 THE LUCKNOW CONFERENCE 1911 The Muslim World 1 2 NaN 164.0 175.0 NaN 1.0 10.1111/j.1478-1913.1911.tb00018.x https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84937435498
9 Adie, H.A. 24752677900 Note on the sex of mosquito larvae 1912 Annals of Tropical Medicine and Parasitology 6 4 NaN 463.0 466.0 NaN NaN 10.1080/00034983.1912.11687086 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84974990645
10 Grifwold, H.D. 57190358443 THE AHMADIYA MOVEMENT 1912 The Muslim World 2 4 NaN 373.0 379.0 NaN NaN 10.1111/j.1478-1913.1912.tb00161.x https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84979315010
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
235 Singh, B.K., Tewari, R.K. 23078762700; 35955197300 Studies on the nature of the racemic modificat... 1947 Proceedings of the Indian Academy of Sciences ... 25 5 NaN 389.0 396.0 NaN NaN 10.1007/BF03171413 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-77951514314
236 Singh, B.K., Nayar, B.K.K. 23078762700; 35954514500 Studies on the nature of the racemic modificat... 1947 Proceedings of the Indian Academy of Sciences ... 25 4 NaN 368.0 374.0 NaN NaN 10.1007/BF03170771 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-77951508751
237 Singh, I., Singh, S.I., Muthana, M.C. 57091912500; 35935612000; 35934898800 (I) The interaction between ions drugs and ele... 1947 Proceedings of the Indian Academy of Sciences ... 25 3 NaN 51.0 56.0 NaN NaN 10.1007/BF03049677 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-77951500967
238 Bambah, R.P. 6507260435 Ramanujan's function ? (N)—k congruence property 1947 Bulletin of the American Mathematical Society 53 8 NaN 764.0 765.0 NaN 2.0 10.1090/S0002-9904-1947-08869-8 https://www.scopus.com/inward/record.url?eid=2... Note Final Open Access Scopus 2-s2.0-84955754437
239 Bambah, R.P., Chowla, S. 6507260435; 35933858000 A new congruence property of ramanujan's funct... 1947 Bulletin of the American Mathematical Society 53 8 NaN 768.0 769.0 NaN 3.0 10.1090/S0002-9904-1947-08871-6 https://www.scopus.com/inward/record.url?eid=2... Note Final Open Access Scopus 2-s2.0-84955628829
240 Bambah, R.P., Chowla, S. 6507260435; 35933858000 Congruence properties of ramanujan's function ... 1947 Bulletin of the American Mathematical Society 53 10 NaN 950.0 955.0 NaN 1.0 10.1090/S0002-9904-1947-08913-8 https://www.scopus.com/inward/record.url?eid=2... Note Final Open Access Scopus 2-s2.0-84955627327
241 Madhok, M.R., Fazal-Ud-Din 56713819200; 24771281300 A simple method of isolating bacteria-free cul... 1947 Soil Science 64 2 NaN 97.0 99.0 NaN NaN NaN https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84936623224
242 Madan, K.E. 36507306100 Spinal analgesia, with sacral escape 1947 British Journal of Anaesthesia 20 3 NaN 95.0 96.0 NaN NaN 10.1093/bja/20.3.95 https://www.scopus.com/inward/record.url?eid=2... Article Final Open Access Scopus 2-s2.0-77957177630
243 Singh, I., Singh, S.I., Muthana, M.C. 57091912500; 35935612000; 35934898800 Errata 1947 Proceedings of the Indian Academy of Sciences ... 25 1 NaN NaN NaN i NaN 10.1007/BF03048782 https://www.scopus.com/inward/record.url?eid=2... Erratum Final NaN Scopus 2-s2.0-77951500152
244 Bambah, R.P., Chowla, S., Gupta, H. 6507260435; 35933858000; 26630472400 A congruence property of ramanujan's function ... 1947 Bulletin of the American Mathematical Society 53 8 NaN 766.0 767.0 NaN NaN 10.1090/S0002-9904-1947-08870-4 https://www.scopus.com/inward/record.url?eid=2... Note Final Open Access Scopus 2-s2.0-84966250351
245 Gill, P.S. 23056443300 Azimuthal variations of cosmic radiation at La... 1947 Physical Review 71 7 NaN 398.0 399.0 NaN NaN 10.1103/PhysRev.71.398 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-36149019818
246 Gill, P.S. 23056443300 Production of mesotrons up to 30,000 feet at a... 1947 Physical Review 71 2 NaN 82.0 84.0 NaN NaN 10.1103/PhysRev.71.82 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-36149003140
247 Singh, B.K., Kumar, A. 23078762700; 35954051100 Chemical examination of the seeds of Carthamus... 1948 Proceedings of the Indian Academy of Sciences ... 27 2 NaN 147.0 155.0 NaN NaN 10.1007/BF03170888 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-77951511496
248 Singh, B.K., Kumar, A. 23078762700; 35954051100 Chemical examination of seeds of Raphanus sati... 1948 Proceedings of the Indian Academy of Sciences ... 27 2 NaN 156.0 164.0 NaN NaN 10.1007/BF03170889 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-77951509776
249 Islam, N. 57152094500 Crux of the Poultry Problem in Pakistan 1948 World's Poultry Science Journal 4 3 NaN 189.0 190.0 NaN NaN 10.1079/WPS19480029 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84959606653
250 Singh, B.K., Singh Manhas, K.M. 23078762700; 35954890600 Studies on the dependence of optical activity ... 1948 Proceedings of the Indian Academy of Sciences ... 27 1 NaN 1.0 13.0 NaN NaN 10.1007/BF03173435 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-77951514722
251 Singh, B.K., Nayar, B.K.K. 23078762700; 35954514500 Studies on the dependence of optical rotatory ... 1948 Proceedings of the Indian Academy of Sciences ... 27 1 NaN 61.0 71.0 NaN NaN 10.1007/BF03173444 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-77951509616
252 Rabbani, M. 24732676200 PENILE CARCINOMA AND CIRCUMCISION 1949 The Lancet 253 6543 NaN 163.0 NaN NaN NaN 10.1016/S0140-6736(49)90446-8 https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-50349119540
253 Zernike, J. 57190717074 General considerations concerning the number o... 1949 Recueil des Travaux Chimiques des Pays?Bas 68 6 NaN 585.0 594.0 NaN 14.0 10.1002/recl.19490680613 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84982064208
254 Brown, J.D. 57190237879 THE HISTORY OF ISLAM IN INDIA II 1949 The Muslim World 39 2 NaN 113.0 125.0 NaN NaN 10.1111/j.1478-1913.1949.tb01000.x https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84978586812
255 Brown, J.D. 57190237879 THE HISTORY OF ISLAM IN INDIA III 1949 The Muslim World 39 3 NaN 179.0 194.0 NaN NaN 10.1111/j.1478-1913.1949.tb01009.x https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84978580407
256 Brown, J.D. 57190237879 THE HISTORY OF ISLAM IN INDIA 1949 The Muslim World 39 1 NaN 11.0 25.0 NaN NaN 10.1111/j.1478-1913.1949.tb00991.x https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84978555355
257 Nasir, M.M. 55564241800 Recent work on mercury as an insecticide again... 1949 Bulletin of Entomological Research 40 2 NaN 299.0 304.0 NaN NaN 10.1017/S0007485300024561 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84958232084
258 Janjua, N.A., Mehra, R.N. 36732830600; 57128824800 The biology of quettania coeruleipennis schwar... 1949 Bulletin of Entomological Research 40 2 NaN 203.0 206.0 NaN NaN 10.1017/S0007485300024500 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-78650998943
259 Latif, A. 35470801600 The taxonomic Status of Drosicha stebbingi (Gr... 1949 Bulletin of Entomological Research 40 3 NaN 351.0 354.0 NaN 3.0 10.1017/S0007485300022811 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-76549232178
260 Parshad, R., Karim, S. 23043040200; 22988689100 Decrease of an electrical discharge by externa... 1949 The Journal of Chemical Physics 17 7 NaN 667.0 668.0 NaN NaN 10.1063/1.1747362 https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-36849123816
261 Pasha, U.A. 25956627600 Report of homœopathy in Pakistan 1950 British Homoeopathic Journal 40 4 NaN 206.0 207.0 NaN NaN 10.1016/S0007-0785(50)80039-X https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-58149437070
262 Zernike, J. 57190717074 On the range of existence of the liquid state 1950 Recueil des Travaux Chimiques des Pays?Bas 69 1 NaN 116.0 124.0 NaN 1.0 10.1002/recl.19500690113 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84982070641
263 De, H.N., Guha, S.R. 35454815400; 35470067300 Effect of p-dimethylaminoazobenzene, o-aminoaz... 1950 British Journal of Cancer 4 4 NaN 430.0 433.0 NaN NaN 10.1038/bjc.1950.42 https://www.scopus.com/inward/record.url?eid=2... Article Final Open Access Scopus 2-s2.0-76549239101
264 Rashid, A. 35457263100 Atypical case of foreign body in the esophagus 1950 Laryngoscope 60 9 NaN 945.0 946.0 NaN NaN NaN https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-76549215347

349 rows × 19 columns

  • Unfortunately, there are now duplicate values in the index. To avoid that, we can tell the concat() function to ignore the index and instead use the default integer index:
In [726]:
pd.concat((pd.read_csv(file,encoding='latin-1') for file in scopus_files), ignore_index=True)
Out[726]:
Authors Author(s)ID Title Year Sourcetitle Volume Issue Art.No. Pagestart Pageend Pagecount Citedby DOI Link DocumentType PublicationStage AccessType Source EID
0 Dowson, J. 57189667124 Art. XIV.—Translation of a Bactrian Pali Inscr... 1870 Journal of the Royal Asiatic Society of Great ... 4 2 NaN 497.0 502.0 NaN NaN 10.1017/S0035869X00016075 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84973958732
1 Benwyan 57124066500 Concretionary Structure in Plaster 1871 Geological Magazine 8 85 NaN 333.0 334.0 NaN NaN 10.1017/S0016756800161588 https://www.scopus.com/inward/record.url?eid=2... Letter Final Open Access Scopus 2-s2.0-84958474637
2 Blanford, W.T. 48861017900 Dr. Feistmantel's paper on the gondwana series 1877 Geological Magazine 4 4 NaN 189.0 190.0 NaN NaN 10.1017/S0016756800149064 https://www.scopus.com/inward/record.url?eid=2... Letter Final Open Access Scopus 2-s2.0-84974085864
3 Wynne, A.B. 49662501200 What is an Erratic? 1878 Geological Magazine 5 4 NaN 185.0 187.0 NaN NaN 10.1017/S0016756800146631 https://www.scopus.com/inward/record.url?eid=2... Letter Final Open Access Scopus 2-s2.0-84958479816
4 Lewis, T.C. 24768550000 Artificial earthquakes [4] 1885 Nature 32 822 NaN 295.0 NaN NaN NaN 10.1038/032295a0 https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-51149163204
5 Fairland, E., Day, W.H. 57189112924; 57189113458 Therapeutic memoranda 1886 British Medical Journal 2 1344 NaN 629.0 NaN NaN NaN 10.1136/bmj.2.1344.629 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84965281283
6 John Hope, W.H.S. 36576681800 Tangier 1886 Notes and Queries s7-I 3 NaN 56.0 NaN NaN NaN 10.1093/nq/s7-I.3.56-g https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-77958214022
7 Constable, F.C. 24768594000 Birds and mirrors [8] 1886 Nature 34 865 NaN 76.0 NaN NaN NaN 10.1038/034076g0 https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-51149142548
8 Evatt, G. 36528517700 The title of the qualification of the society ... 1887 British Medical Journal 2 1391 NaN 483.0 NaN NaN NaN 10.1136/bmj.2.1391.483-a https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84965360789
9 Prichard, A. 57189111145 The present stock of humanised lymph 1887 British Medical Journal 2 1408 NaN 1402.0 NaN NaN NaN 10.1136/bmj.2.1408.1402-a https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84965202901
10 Eustace, M. 57189117009 “horror” and imperfect chloroform anæsthesia 1893 British Medical Journal 2 1710 NaN 814.0 815.0 NaN NaN 10.1136/bmj.2.1710.814-d https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84965224516
11 Williamis, W.R. 57189118768 Gall stones and cancer 1893 British Medical Journal 2 1704 NaN 490.0 NaN NaN NaN 10.1136/bmj.2.1704.490-b https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84965224100
12 Jackson, M.J. 55456067900 Printing mathematics [7] 1893 Nature 47 1210 NaN 227.0 NaN NaN NaN 10.1038/047227c0 https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-51149140956
13 Eustace, M. 57189117009 A case of hydatids of femur in the site of fra... 1894 British Medical Journal 1 1743 NaN 1124.0 1125.0 NaN NaN 10.1136/bmj.1.1743.1124-a https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84965227534
14 Battebsby, J. 57189157908 Chitral relief force 1895 British Medical Journal 2 1824 NaN 1526.0 NaN NaN NaN 10.1136/bmj.2.1824.1526-a https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84965349699
15 Toogood, F.S. 57189152489 Reports on medical &amp; surgical practice in ... 1896 British Medical Journal 1 1845 NaN 1144.0 1145.0 NaN NaN 10.1136/bmj.1.1845.1144 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84965321899
16 Eustace, M. 57189117009 Club rates: A suggestion 1896 British Medical Journal 1 1828 NaN 116.0 NaN NaN NaN 10.1136/bmj.1.1828.116-b https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84965220641
17 Grant, D.G.F. 24769006300 Barisal guns [2] 1896 Nature 53 1366 NaN 197.0 NaN NaN NaN 10.1038/053197a0 https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-51149156804
18 Nunn, J.A. 57189120428 Sunstroke in animals 1898 British Medical Journal 1 1943 NaN 862.0 NaN NaN 1.0 10.1136/bmj.1.1943.862-a https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84942166966
19 Battersby, J. 57189430325 The recent epidemic of typhoid fever in south ... 1901 British Medical Journal 1 2111 NaN 1521.0 NaN NaN NaN 10.1136/bmj.1.2111.1521-a https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84970814279
20 Thomson, F. 57189118610 “Oral sepsis” in scarlet fever 1906 British Medical Journal 1 2357 NaN 534.0 535.0 NaN NaN 10.1136/bmj.1.2357.534-a https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-84965219744
21 Brid, W.D. 57067161300 A Company of Russian Machine Guns at the Battl... 1906 Royal United Services Institution. Journal 50 346 NaN 1498.0 1503.0 NaN NaN 10.1080/03071840609431330 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84954899083
22 Jones, B.M. 23076847500 CLXIX. - The spontaneous crystallisation of so... 1908 Journal of the Chemical Society, Transactions 93 NaN NaN 1739.0 1747.0 NaN 2.0 10.1039/CT9089301739 https://www.scopus.com/inward/record.url?eid=2... Review Final NaN Scopus 2-s2.0-37049152660
23 Hemmy, A.S. 36964532600 The earth and comets' tails [5] 1910 Nature 83 2120 NaN 459.0 NaN NaN NaN NaN https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-51149133856
24 McCarrison, R. 23077533900 Further experimental researches on the etiolog... 1911 Annals of Tropical Medicine and Parasitology 5 1 NaN 1.0 14.0 NaN NaN 10.1080/00034983.1911.11686338 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84975010534
25 Harris, G.C. 57066128000 Spun silk sewing thread counts 1911 Journal of the Textile Institute Proceedings a... 2 2 NaN 160.0 NaN NaN NaN 10.1080/00405001108631719 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84954867566
26 Bomford, T. 56907822900 THE RIGHT ANGLE OF APPROACH 1911 The Muslim World 1 3 NaN 283.0 288.0 NaN NaN 10.1111/j.1478-1913.1911.tb00034.x https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84944486888
27 Weitbrecht, H.U. 56907875600 THE LUCKNOW CONFERENCE 1911 The Muslim World 1 2 NaN 164.0 175.0 NaN 1.0 10.1111/j.1478-1913.1911.tb00018.x https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84937435498
28 Adie, H.A. 24752677900 Note on the sex of mosquito larvae 1912 Annals of Tropical Medicine and Parasitology 6 4 NaN 463.0 466.0 NaN NaN 10.1080/00034983.1912.11687086 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84974990645
29 Grifwold, H.D. 57190358443 THE AHMADIYA MOVEMENT 1912 The Muslim World 2 4 NaN 373.0 379.0 NaN NaN 10.1111/j.1478-1913.1912.tb00161.x https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84979315010
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
319 Singh, B.K., Tewari, R.K. 23078762700; 35955197300 Studies on the nature of the racemic modificat... 1947 Proceedings of the Indian Academy of Sciences ... 25 5 NaN 389.0 396.0 NaN NaN 10.1007/BF03171413 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-77951514314
320 Singh, B.K., Nayar, B.K.K. 23078762700; 35954514500 Studies on the nature of the racemic modificat... 1947 Proceedings of the Indian Academy of Sciences ... 25 4 NaN 368.0 374.0 NaN NaN 10.1007/BF03170771 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-77951508751
321 Singh, I., Singh, S.I., Muthana, M.C. 57091912500; 35935612000; 35934898800 (I) The interaction between ions drugs and ele... 1947 Proceedings of the Indian Academy of Sciences ... 25 3 NaN 51.0 56.0 NaN NaN 10.1007/BF03049677 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-77951500967
322 Bambah, R.P. 6507260435 Ramanujan's function ? (N)—k congruence property 1947 Bulletin of the American Mathematical Society 53 8 NaN 764.0 765.0 NaN 2.0 10.1090/S0002-9904-1947-08869-8 https://www.scopus.com/inward/record.url?eid=2... Note Final Open Access Scopus 2-s2.0-84955754437
323 Bambah, R.P., Chowla, S. 6507260435; 35933858000 A new congruence property of ramanujan's funct... 1947 Bulletin of the American Mathematical Society 53 8 NaN 768.0 769.0 NaN 3.0 10.1090/S0002-9904-1947-08871-6 https://www.scopus.com/inward/record.url?eid=2... Note Final Open Access Scopus 2-s2.0-84955628829
324 Bambah, R.P., Chowla, S. 6507260435; 35933858000 Congruence properties of ramanujan's function ... 1947 Bulletin of the American Mathematical Society 53 10 NaN 950.0 955.0 NaN 1.0 10.1090/S0002-9904-1947-08913-8 https://www.scopus.com/inward/record.url?eid=2... Note Final Open Access Scopus 2-s2.0-84955627327
325 Madhok, M.R., Fazal-Ud-Din 56713819200; 24771281300 A simple method of isolating bacteria-free cul... 1947 Soil Science 64 2 NaN 97.0 99.0 NaN NaN NaN https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84936623224
326 Madan, K.E. 36507306100 Spinal analgesia, with sacral escape 1947 British Journal of Anaesthesia 20 3 NaN 95.0 96.0 NaN NaN 10.1093/bja/20.3.95 https://www.scopus.com/inward/record.url?eid=2... Article Final Open Access Scopus 2-s2.0-77957177630
327 Singh, I., Singh, S.I., Muthana, M.C. 57091912500; 35935612000; 35934898800 Errata 1947 Proceedings of the Indian Academy of Sciences ... 25 1 NaN NaN NaN i NaN 10.1007/BF03048782 https://www.scopus.com/inward/record.url?eid=2... Erratum Final NaN Scopus 2-s2.0-77951500152
328 Bambah, R.P., Chowla, S., Gupta, H. 6507260435; 35933858000; 26630472400 A congruence property of ramanujan's function ... 1947 Bulletin of the American Mathematical Society 53 8 NaN 766.0 767.0 NaN NaN 10.1090/S0002-9904-1947-08870-4 https://www.scopus.com/inward/record.url?eid=2... Note Final Open Access Scopus 2-s2.0-84966250351
329 Gill, P.S. 23056443300 Azimuthal variations of cosmic radiation at La... 1947 Physical Review 71 7 NaN 398.0 399.0 NaN NaN 10.1103/PhysRev.71.398 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-36149019818
330 Gill, P.S. 23056443300 Production of mesotrons up to 30,000 feet at a... 1947 Physical Review 71 2 NaN 82.0 84.0 NaN NaN 10.1103/PhysRev.71.82 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-36149003140
331 Singh, B.K., Kumar, A. 23078762700; 35954051100 Chemical examination of the seeds of Carthamus... 1948 Proceedings of the Indian Academy of Sciences ... 27 2 NaN 147.0 155.0 NaN NaN 10.1007/BF03170888 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-77951511496
332 Singh, B.K., Kumar, A. 23078762700; 35954051100 Chemical examination of seeds of Raphanus sati... 1948 Proceedings of the Indian Academy of Sciences ... 27 2 NaN 156.0 164.0 NaN NaN 10.1007/BF03170889 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-77951509776
333 Islam, N. 57152094500 Crux of the Poultry Problem in Pakistan 1948 World's Poultry Science Journal 4 3 NaN 189.0 190.0 NaN NaN 10.1079/WPS19480029 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84959606653
334 Singh, B.K., Singh Manhas, K.M. 23078762700; 35954890600 Studies on the dependence of optical activity ... 1948 Proceedings of the Indian Academy of Sciences ... 27 1 NaN 1.0 13.0 NaN NaN 10.1007/BF03173435 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-77951514722
335 Singh, B.K., Nayar, B.K.K. 23078762700; 35954514500 Studies on the dependence of optical rotatory ... 1948 Proceedings of the Indian Academy of Sciences ... 27 1 NaN 61.0 71.0 NaN NaN 10.1007/BF03173444 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-77951509616
336 Rabbani, M. 24732676200 PENILE CARCINOMA AND CIRCUMCISION 1949 The Lancet 253 6543 NaN 163.0 NaN NaN NaN 10.1016/S0140-6736(49)90446-8 https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-50349119540
337 Zernike, J. 57190717074 General considerations concerning the number o... 1949 Recueil des Travaux Chimiques des Pays?Bas 68 6 NaN 585.0 594.0 NaN 14.0 10.1002/recl.19490680613 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84982064208
338 Brown, J.D. 57190237879 THE HISTORY OF ISLAM IN INDIA II 1949 The Muslim World 39 2 NaN 113.0 125.0 NaN NaN 10.1111/j.1478-1913.1949.tb01000.x https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84978586812
339 Brown, J.D. 57190237879 THE HISTORY OF ISLAM IN INDIA III 1949 The Muslim World 39 3 NaN 179.0 194.0 NaN NaN 10.1111/j.1478-1913.1949.tb01009.x https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84978580407
340 Brown, J.D. 57190237879 THE HISTORY OF ISLAM IN INDIA 1949 The Muslim World 39 1 NaN 11.0 25.0 NaN NaN 10.1111/j.1478-1913.1949.tb00991.x https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84978555355
341 Nasir, M.M. 55564241800 Recent work on mercury as an insecticide again... 1949 Bulletin of Entomological Research 40 2 NaN 299.0 304.0 NaN NaN 10.1017/S0007485300024561 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84958232084
342 Janjua, N.A., Mehra, R.N. 36732830600; 57128824800 The biology of quettania coeruleipennis schwar... 1949 Bulletin of Entomological Research 40 2 NaN 203.0 206.0 NaN NaN 10.1017/S0007485300024500 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-78650998943
343 Latif, A. 35470801600 The taxonomic Status of Drosicha stebbingi (Gr... 1949 Bulletin of Entomological Research 40 3 NaN 351.0 354.0 NaN 3.0 10.1017/S0007485300022811 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-76549232178
344 Parshad, R., Karim, S. 23043040200; 22988689100 Decrease of an electrical discharge by externa... 1949 The Journal of Chemical Physics 17 7 NaN 667.0 668.0 NaN NaN 10.1063/1.1747362 https://www.scopus.com/inward/record.url?eid=2... Letter Final NaN Scopus 2-s2.0-36849123816
345 Pasha, U.A. 25956627600 Report of homœopathy in Pakistan 1950 British Homoeopathic Journal 40 4 NaN 206.0 207.0 NaN NaN 10.1016/S0007-0785(50)80039-X https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-58149437070
346 Zernike, J. 57190717074 On the range of existence of the liquid state 1950 Recueil des Travaux Chimiques des Pays?Bas 69 1 NaN 116.0 124.0 NaN 1.0 10.1002/recl.19500690113 https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-84982070641
347 De, H.N., Guha, S.R. 35454815400; 35470067300 Effect of p-dimethylaminoazobenzene, o-aminoaz... 1950 British Journal of Cancer 4 4 NaN 430.0 433.0 NaN NaN 10.1038/bjc.1950.42 https://www.scopus.com/inward/record.url?eid=2... Article Final Open Access Scopus 2-s2.0-76549239101
348 Rashid, A. 35457263100 Atypical case of foreign body in the esophagus 1950 Laryngoscope 60 9 NaN 945.0 946.0 NaN NaN NaN https://www.scopus.com/inward/record.url?eid=2... Article Final NaN Scopus 2-s2.0-76549215347

349 rows × 19 columns

w.r.t column

In [727]:
student_files = sorted(glob('Student*.csv'))
student_files
Out[727]:
['Student1.csv', 'Student2.csv', 'Student3.csv']
In [728]:
pd.concat((pd.read_csv(file) for file in student_files), axis='columns')
Out[728]:
Name ID Physics Math
0 Umer Saeed F2017313014 90 50
1 Ali Saeed F2017313016 95 45
2 Ahmed Saeed F2017313018 97 47

11.28 Pivot

In [729]:
from IPython.display import YouTubeVideo
YouTubeVideo('xPPs59pn6qU',width=900, height=500)
Out[729]:
In [730]:
df= pd.read_csv('Weather3.csv')
df
Out[730]:
Date City Temperature Humidity
0 1-Jan-19 Lahore 5 20
1 2-Jan-19 Lahore 8 25
2 3-Jan-19 Lahore 10 15
3 1-Jan-19 Karachi 15 40
4 2-Jan-19 Karachi 18 45
5 3-Jan-19 Karachi 20 50
6 1-Jan-19 RWP/ISB 4 15
7 2-Jan-19 RWP/ISB 7 20
8 3-Jan-19 RWP/ISB 9 10
In [731]:
df.pivot(index="Date",columns="City")
Out[731]:
Temperature Humidity
City Karachi Lahore RWP/ISB Karachi Lahore RWP/ISB
Date
1-Jan-19 15 5 4 40 20 15
2-Jan-19 18 8 7 45 25 20
3-Jan-19 20 10 9 50 15 10
  • if only Temperature values required.
In [732]:
df.pivot(index="Date",columns="City",values='Temperature')
Out[732]:
City Karachi Lahore RWP/ISB
Date
1-Jan-19 15 5 4
2-Jan-19 18 8 7
3-Jan-19 20 10 9

11.29 Pivot Table

  • More Information
  • Pivot table is used to summarize and aggregate data inside dataframe
In [733]:
df= pd.read_csv('Weather4.csv')
df
Out[733]:
Date City Temperature Humidity
0 1-Jan-19 Lahore 5 20
1 1-Jan-19 Lahore 4 15
2 2-Jan-19 Lahore 8 25
3 2-Jan-19 Lahore 7 10
4 3-Jan-19 Lahore 10 15
5 3-Jan-19 Lahore 1 2
6 1-Jan-19 Karachi 15 40
7 1-Jan-19 Karachi 10 20
8 2-Jan-19 Karachi 18 45
9 2-Jan-19 Karachi 9 22
10 3-Jan-19 Karachi 20 50
11 3-Jan-19 Karachi 10 25
12 1-Jan-19 RWP/ISB 4 15
13 1-Jan-19 RWP/ISB 2 7
14 2-Jan-19 RWP/ISB 7 20
15 2-Jan-19 RWP/ISB 3 10
16 3-Jan-19 RWP/ISB 9 10
17 3-Jan-19 RWP/ISB 3 3

Example-1

In [734]:
df.pivot_table(index="City",columns="Date")
Out[734]:
Humidity Temperature
Date 1-Jan-19 2-Jan-19 3-Jan-19 1-Jan-19 2-Jan-19 3-Jan-19
City
Karachi 30.0 33.5 37.5 12.5 13.5 15.0
Lahore 17.5 17.5 8.5 4.5 7.5 5.5
RWP/ISB 11.0 15.0 6.5 3.0 5.0 6.0

Example-2

In [735]:
df.pivot_table(index="City",columns="Date",values='Temperature')
Out[735]:
Date 1-Jan-19 2-Jan-19 3-Jan-19
City
Karachi 12.5 13.5 15.0
Lahore 4.5 7.5 5.5
RWP/ISB 3.0 5.0 6.0

Example-3

In [736]:
df.pivot_table(index="City",columns="Date",values='Temperature',aggfunc='sum')
Out[736]:
Date 1-Jan-19 2-Jan-19 3-Jan-19
City
Karachi 25 27 30
Lahore 9 15 11
RWP/ISB 6 10 12

Example-4

In [737]:
df.pivot_table(index="City",columns="Date",values='Temperature',aggfunc='count')
Out[737]:
Date 1-Jan-19 2-Jan-19 3-Jan-19
City
Karachi 2 2 2
Lahore 2 2 2
RWP/ISB 2 2 2

Example-5

In [738]:
df.pivot_table(index="City",columns="Date",values='Temperature',aggfunc='mean')
Out[738]:
Date 1-Jan-19 2-Jan-19 3-Jan-19
City
Karachi 12.5 13.5 15.0
Lahore 4.5 7.5 5.5
RWP/ISB 3.0 5.0 6.0

Example-6

In [739]:
df.pivot_table(index="City",columns="Date",values='Temperature',margins=True,aggfunc='sum')
Out[739]:
Date 1-Jan-19 2-Jan-19 3-Jan-19 All
City
Karachi 25 27 30 82
Lahore 9 15 11 35
RWP/ISB 6 10 12 28
All 40 52 53 145

Example-7

In [740]:
df.pivot_table(index="City",columns="Date",values='Temperature',margins=True)
Out[740]:
Date 1-Jan-19 2-Jan-19 3-Jan-19 All
City
Karachi 12.500000 13.500000 15.000000 13.666667
Lahore 4.500000 7.500000 5.500000 5.833333
RWP/ISB 3.000000 5.000000 6.000000 4.666667
All 6.666667 8.666667 8.833333 8.055556

11.30 Grouper

Example-1

In [741]:
df= pd.read_csv('Weather5.csv',parse_dates=['Date'])
df
Out[741]:
Date City Temperature Humidity
0 2019-01-01 Lahore 5 20
1 2019-01-02 Lahore 8 25
2 2019-01-03 Lahore 1 2
3 2019-07-01 Lahore 43 55
4 2019-07-02 Lahore 50 50
5 2019-07-03 Lahore 48 60
In [742]:
df.pivot_table(index=pd.Grouper(freq='M',key='Date'),columns='City')
Out[742]:
Humidity Temperature
City Lahore Lahore
Date
2019-01-31 15.666667 4.666667
2019-07-31 55.000000 47.000000

Example-2

In [743]:
df.pivot_table(index=pd.Grouper(freq='M',key='Date'),columns='City',aggfunc='sum')
Out[743]:
Humidity Temperature
City Lahore Lahore
Date
2019-01-31 47 14
2019-07-31 165 141

11.31 Melt

  • Melt is used to transform or reshape data
  • “Unpivots” a DataFrame from wide format to long format, optionally leaving identifier variables set.
  • This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.
  • More Information
In [744]:
from IPython.display import YouTubeVideo
YouTubeVideo('oY62o-tBHF4',width=900, height=500)
Out[744]:
In [745]:
df= pd.read_csv('Weather6.csv')
df
Out[745]:
Day Lahore Karachi RWP/ISB
0 Monday 10 20 9
1 Tuesday 11 21 8
2 Wednesday 12 22 7
3 Thursday 13 23 6
4 Friday 14 24 5
5 Saturday 15 25 4
6 Sunday 16 26 3

Example-1

In [746]:
df1=pd.melt(df,id_vars=["Day"])
df1
Out[746]:
Day variable value
0 Monday Lahore 10
1 Tuesday Lahore 11
2 Wednesday Lahore 12
3 Thursday Lahore 13
4 Friday Lahore 14
5 Saturday Lahore 15
6 Sunday Lahore 16
7 Monday Karachi 20
8 Tuesday Karachi 21
9 Wednesday Karachi 22
10 Thursday Karachi 23
11 Friday Karachi 24
12 Saturday Karachi 25
13 Sunday Karachi 26
14 Monday RWP/ISB 9
15 Tuesday RWP/ISB 8
16 Wednesday RWP/ISB 7
17 Thursday RWP/ISB 6
18 Friday RWP/ISB 5
19 Saturday RWP/ISB 4
20 Sunday RWP/ISB 3

Example-2

In [747]:
df1=pd.melt(df,id_vars=["Day"],var_name="City", value_name='Temp')
df1
Out[747]:
Day City Temp
0 Monday Lahore 10
1 Tuesday Lahore 11
2 Wednesday Lahore 12
3 Thursday Lahore 13
4 Friday Lahore 14
5 Saturday Lahore 15
6 Sunday Lahore 16
7 Monday Karachi 20
8 Tuesday Karachi 21
9 Wednesday Karachi 22
10 Thursday Karachi 23
11 Friday Karachi 24
12 Saturday Karachi 25
13 Sunday Karachi 26
14 Monday RWP/ISB 9
15 Tuesday RWP/ISB 8
16 Wednesday RWP/ISB 7
17 Thursday RWP/ISB 6
18 Friday RWP/ISB 5
19 Saturday RWP/ISB 4
20 Sunday RWP/ISB 3

Example-3

In [748]:
df1[df1["City"]=="Lahore"]
Out[748]:
Day City Temp
0 Monday Lahore 10
1 Tuesday Lahore 11
2 Wednesday Lahore 12
3 Thursday Lahore 13
4 Friday Lahore 14
5 Saturday Lahore 15
6 Sunday Lahore 16

11.32 Crosstab

In [749]:
from IPython.display import YouTubeVideo
YouTubeVideo('I_kUj-MfYys',width=900, height=500)
Out[749]:
In [750]:
df=pd.read_csv("Handedness.csv")
df
Out[750]:
Name Nationality Sex Age Handedness
0 Ali Pakistan Male 32 Left
1 Amir Bangadesh Male 28 Left
2 Tauseef KSA Male 28 Left
3 Sarah Pakistan Female 5 Right
4 Umer Pakistan Male 36 Right
5 Ahmed Pakistan Male 30 Right
6 Bilal Bangadesh Male 30 Right
7 Zarah Bangadesh Female 5 Right
8 Ijlal Bangadesh Male 30 Right
9 Yasir KSA Male 30 Right
10 Kamran KSA Male 30 Right
11 Sana KSA Female 5 Right

Example-1

In [751]:
pd.crosstab(df.Nationality,df.Handedness)
Out[751]:
Handedness Left Right
Nationality
Bangadesh 1 3
KSA 1 3
Pakistan 1 3

Example-2

In [752]:
pd.crosstab(df.Sex,df.Handedness)
Out[752]:
Handedness Left Right
Sex
Female 0 3
Male 3 6

Example-3

In [753]:
pd.crosstab(df.Sex,df.Handedness,margins=True)
Out[753]:
Handedness Left Right All
Sex
Female 0 3 3
Male 3 6 9
All 3 9 12

Example-4

In [754]:
pd.crosstab(df.Sex,df.Handedness,margins=True,values=df.Age,aggfunc='mean')
Out[754]:
Handedness Left Right All
Sex
Female NaN 5.000000 5.000000
Male 29.333333 31.000000 30.444444
All 29.333333 22.333333 24.083333

Example-5

In [755]:
pd.crosstab(df.Sex,df.Handedness,normalize='index')
Out[755]:
Handedness Left Right
Sex
Female 0.000000 1.000000
Male 0.333333 0.666667

Example-6

In [756]:
pd.crosstab(df.Sex,[df.Handedness,df.Nationality],margins=True)
Out[756]:
Handedness Left Right All
Nationality Bangadesh KSA Pakistan Bangadesh KSA Pakistan
Sex
Female 0 0 0 1 1 1 3
Male 1 1 1 2 2 2 9
All 1 1 1 3 3 3 12

Example-7

In [757]:
pd.crosstab([df.Sex,df.Nationality],df.Handedness,margins=True)
Out[757]:
Handedness Left Right All
Sex Nationality
Female Bangadesh 0 1 1
KSA 0 1 1
Pakistan 0 1 1
Male Bangadesh 1 2 3
KSA 1 2 3
Pakistan 1 2 3
All 3 9 12

11.33 groupby

In [758]:
from IPython.display import YouTubeVideo
YouTubeVideo('qy0fDqoMJx8',width=900, height=500)
Out[758]:
In [759]:
df= pd.read_csv('GroupBy.csv')
df
Out[759]:
Date Province City Temperature Windspeed Event
0 1-Jan-19 Punjab Lahore 4 9 Rain
1 2-Jan-19 Punjab Lahore 8 6 Snow
2 3-Jan-19 Punjab Lahore 10 7 Sunny
3 4-Jan-19 Punjab Lahore 6 12 Snow
4 1-Jan-19 Punjab SKT 5 10 Sunny
5 2-Jan-19 Punjab SKT 9 5 Snow
6 3-Jan-19 Punjab SKT 11 4 Rain
7 4-Jan-19 Punjab SKT 7 14 Snow
8 1-Jan-19 Punjab HFZ 8 10 Sunny
9 2-Jan-19 Punjab HFZ 10 5 Snow
10 3-Jan-19 Punjab HFZ 12 4 Rain
11 4-Jan-19 Punjab HFZ 8 14 Snow
12 1-Jan-19 KPK Abbottabad 3 8 Rain
13 2-Jan-19 KPK Abbottabad 7 5 Snow
14 3-Jan-19 KPK Abbottabad 9 6 Sunny
15 4-Jan-19 KPK Abbottabad 5 11 Snow
16 1-Jan-19 KPK Peshawar 4 9 Sunny
17 2-Jan-19 KPK Peshawar 8 4 Snow
18 3-Jan-19 KPK Peshawar 10 3 Rain
19 4-Jan-19 KPK Peshawar 6 13 Snow
20 1-Jan-19 KPK Mansehra 7 9 Sunny
21 2-Jan-19 KPK Mansehra 9 4 Snow
22 3-Jan-19 KPK Mansehra 11 3 Rain
23 4-Jan-19 KPK Mansehra 7 13 Snow

Example-1

In [760]:
df.groupby('City').Temperature.sum()
Out[760]:
City
Abbottabad    24
HFZ           38
Lahore        28
Mansehra      34
Peshawar      28
SKT           32
Name: Temperature, dtype: int64

Example-2

In [761]:
df.groupby(['Province', 'City']).Temperature.sum()
Out[761]:
Province  City      
KPK       Abbottabad    24
          Mansehra      34
          Peshawar      28
Punjab    HFZ           38
          Lahore        28
          SKT           32
Name: Temperature, dtype: int64

Example-3

In [762]:
df.groupby(['Province', 'City']).Temperature.sum().unstack()
Out[762]:
City Abbottabad HFZ Lahore Mansehra Peshawar SKT
Province
KPK 24.0 NaN NaN 34.0 28.0 NaN
Punjab NaN 38.0 28.0 NaN NaN 32.0

11.34 get_group

In [763]:
df= pd.read_csv('GroupBy.csv')
df1=df.groupby('City')
df2=df1.get_group('Lahore')
df2
Out[763]:
Date Province City Temperature Windspeed Event
0 1-Jan-19 Punjab Lahore 4 9 Rain
1 2-Jan-19 Punjab Lahore 8 6 Snow
2 3-Jan-19 Punjab Lahore 10 7 Sunny
3 4-Jan-19 Punjab Lahore 6 12 Snow
In [764]:
df2.mean()
Out[764]:
Temperature    7.0
Windspeed      8.5
dtype: float64
In [765]:
df2.describe()
Out[765]:
Temperature Windspeed
count 4.000000 4.000000
mean 7.000000 8.500000
std 2.581989 2.645751
min 4.000000 6.000000
25% 5.500000 6.750000
50% 7.000000 8.000000
75% 8.500000 9.750000
max 10.000000 12.000000

11.35 Transform Function in Pandas

  • Call function producing a like-indexed NDFrame and return a NDFrame with the transformed values
  • More Information

Example-1

In [766]:
df = pd.read_csv('chipotle.tsv',sep='\t')
df.head()
Out[766]:
order_id quantity item_name choice_description item_price
0 1 1 Chips and Fresh Tomato Salsa NaN $2.39
1 1 1 Izze [Clementine] $3.39
2 1 1 Nantucket Nectar [Apple] $3.39
3 1 1 Chips and Tomatillo-Green Chili Salsa NaN $2.39
4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98
In [767]:
df.dtypes
Out[767]:
order_id               int64
quantity               int64
item_name             object
choice_description    object
item_price            object
dtype: object
In [768]:
# Preprocessing (Step-1)
df['item_price'] = df['item_price'].str.replace("$","")
In [769]:
df.dtypes
Out[769]:
order_id               int64
quantity               int64
item_name             object
choice_description    object
item_price            object
dtype: object
In [770]:
# Preprocessing (Step-2)
df['item_price'] = pd.to_numeric(df['item_price'])
In [771]:
df.dtypes
Out[771]:
order_id                int64
quantity                int64
item_name              object
choice_description     object
item_price            float64
dtype: object
  • Each order has an order_id and consists of one or more rows. To figure out the total price of an order, you sum the item_price for that order_id. For example, here's the total price of order number 1:
In [772]:
df[df.order_id == 1].item_price.sum()
Out[772]:
11.56
  • If you wanted to calculate the total price of every order, you would groupby() order_id and then take the sum of item_price for each group:
In [773]:
df.groupby('order_id').item_price.sum().head()
Out[773]:
order_id
1    11.56
2    16.98
3    12.67
4    21.00
5    13.70
Name: item_price, dtype: float64
  • However, you're not actually limited to aggregating by a single function such as sum().
  • To aggregate by multiple functions, you use the agg() method and pass it a list of functions such as sum() and count():
In [774]:
df.groupby('order_id').item_price.agg(['sum', 'count']).head()
Out[774]:
sum count
order_id
1 11.56 4
2 16.98 1
3 12.67 2
4 21.00 2
5 13.70 2
In [775]:
len(df.groupby('order_id').item_price.sum())
Out[775]:
1834

...is smaller than the input to the function:

In [776]:
len(df.item_price)
Out[776]:
4622
  • The solution is to use the transform() method, which performs the same calculation but returns output data that is the same shape as the input data:
In [777]:
df['total_price'] = df.groupby('order_id').item_price.transform('sum')
In [778]:
df.head()
Out[778]:
order_id quantity item_name choice_description item_price total_price
0 1 1 Chips and Fresh Tomato Salsa NaN 2.39 11.56
1 1 1 Izze [Clementine] 3.39 11.56
2 1 1 Nantucket Nectar [Apple] 3.39 11.56
3 1 1 Chips and Tomatillo-Green Chili Salsa NaN 2.39 11.56
4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... 16.98 16.98
  • As you can see, the total price of each order is now listed on every single line.
  • That makes it easy to calculate the percentage of the total order price that each line represents:
In [779]:
df['percent_of_total'] = df.item_price / df.total_price

Example-2

In [780]:
df = pd.DataFrame({"measure": ["a","a","b","a","c","c"],
                     "aq": [10,20,30,20,30,50]})
df
Out[780]:
measure aq
0 a 10
1 a 20
2 b 30
3 a 20
4 c 30
5 c 50
In [781]:
df["colour"] = (100.0 * df["aq"] / 
                         df.groupby("measure")["aq"].transform(max))
df
Out[781]:
measure aq colour
0 a 10 50.0
1 a 20 100.0
2 b 30 100.0
3 a 20 100.0
4 c 30 60.0
5 c 50 100.0

11.36 Remove duplicate rows in pandas

  • More Information
  • Return DataFrame with duplicate rows removed, optionally only considering certain columns
In [782]:
from IPython.display import YouTubeVideo
YouTubeVideo('ht5buXUMqkQ',width=900, height=500)
Out[782]:
In [783]:
df= pd.read_csv('DupDS.csv')
df
Out[783]:
Name Student ID School Program
0 Umer Saeed F2017313014 SBE MSDS
1 Ali Saeed F2017313015 SST MSCS
2 Umer Saeed F2017313014 SBE MSDS
3 Umer Saeed F2017313014 SBE MSDS
4 Ahmed Abdullah Saeed F2017313016 SST PHd CS
5 Irfan Kareem F2017313018 SST MS CS
6 Muhammad Ijalal Kahn F2017313020 SST MS CS
7 Hassan Raza F2017313022 SEN MS EE
8 Muhammad Ijalal Kahn F2017313020 SST MS CS
9 Irfan Kareem F2017313018 SST MS CS

identify duplicated rows

  • More Information
  • Return boolean Series denoting duplicate rows, optionally only considering certain columns
In [784]:
df.duplicated()
Out[784]:
0    False
1    False
2     True
3     True
4    False
5    False
6    False
7    False
8     True
9     True
dtype: bool
In [785]:
df.duplicated().sum()
Out[785]:
4

Find duplicated rows

In [786]:
df.loc[df.duplicated(),:]
Out[786]:
Name Student ID School Program
2 Umer Saeed F2017313014 SBE MSDS
3 Umer Saeed F2017313014 SBE MSDS
8 Muhammad Ijalal Kahn F2017313020 SST MS CS
9 Irfan Kareem F2017313018 SST MS CS
In [787]:
df.loc[df.duplicated(keep='first'),:]
Out[787]:
Name Student ID School Program
2 Umer Saeed F2017313014 SBE MSDS
3 Umer Saeed F2017313014 SBE MSDS
8 Muhammad Ijalal Kahn F2017313020 SST MS CS
9 Irfan Kareem F2017313018 SST MS CS
In [788]:
df.loc[df.duplicated(keep='last'),:]
Out[788]:
Name Student ID School Program
0 Umer Saeed F2017313014 SBE MSDS
2 Umer Saeed F2017313014 SBE MSDS
5 Irfan Kareem F2017313018 SST MS CS
6 Muhammad Ijalal Kahn F2017313020 SST MS CS
In [789]:
df.loc[df.duplicated(keep=False),:]
Out[789]:
Name Student ID School Program
0 Umer Saeed F2017313014 SBE MSDS
2 Umer Saeed F2017313014 SBE MSDS
3 Umer Saeed F2017313014 SBE MSDS
5 Irfan Kareem F2017313018 SST MS CS
6 Muhammad Ijalal Kahn F2017313020 SST MS CS
8 Muhammad Ijalal Kahn F2017313020 SST MS CS
9 Irfan Kareem F2017313018 SST MS CS

Shape of the Data Frame after drop duplicates

In [790]:
df.drop_duplicates().shape
Out[790]:
(6, 4)
In [791]:
df.drop_duplicates(keep='first').shape
Out[791]:
(6, 4)
In [792]:
df.drop_duplicates(keep='last').shape
Out[792]:
(6, 4)
In [793]:
df.drop_duplicates(keep=False).shape
Out[793]:
(3, 4)

Data Frame after drop duplicates

In [794]:
df.drop_duplicates()
Out[794]:
Name Student ID School Program
0 Umer Saeed F2017313014 SBE MSDS
1 Ali Saeed F2017313015 SST MSCS
4 Ahmed Abdullah Saeed F2017313016 SST PHd CS
5 Irfan Kareem F2017313018 SST MS CS
6 Muhammad Ijalal Kahn F2017313020 SST MS CS
7 Hassan Raza F2017313022 SEN MS EE
In [795]:
df.drop_duplicates(keep='first')
Out[795]:
Name Student ID School Program
0 Umer Saeed F2017313014 SBE MSDS
1 Ali Saeed F2017313015 SST MSCS
4 Ahmed Abdullah Saeed F2017313016 SST PHd CS
5 Irfan Kareem F2017313018 SST MS CS
6 Muhammad Ijalal Kahn F2017313020 SST MS CS
7 Hassan Raza F2017313022 SEN MS EE
In [796]:
df.drop_duplicates(keep='last')
Out[796]:
Name Student ID School Program
1 Ali Saeed F2017313015 SST MSCS
3 Umer Saeed F2017313014 SBE MSDS
4 Ahmed Abdullah Saeed F2017313016 SST PHd CS
7 Hassan Raza F2017313022 SEN MS EE
8 Muhammad Ijalal Kahn F2017313020 SST MS CS
9 Irfan Kareem F2017313018 SST MS CS
In [797]:
df.drop_duplicates(keep=False)
Out[797]:
Name Student ID School Program
1 Ali Saeed F2017313015 SST MSCS
4 Ahmed Abdullah Saeed F2017313016 SST PHd CS
7 Hassan Raza F2017313022 SEN MS EE

11.37 Pandas Stack

In [798]:
from IPython.display import YouTubeVideo
YouTubeVideo('BUOy4RUUepg',width=900, height=500)
Out[798]:

Example-1

In [799]:
df=pd.read_excel("Company.xlsx",header=[0,1],index_col=[0])
df
Out[799]:
Price Price to earning rate
Company Facebook Google Microsoft Facebook Google Microsoft
2019-01-01 150 1000 500 33 43 83
2019-01-02 140 950 400 34 53 93
2019-01-03 135 900 300 35 63 103
2019-01-04 130 800 200 36 73 113
In [800]:
df1=df.stack()
df1
Out[800]:
Price Price to earning rate
Company
2019-01-01 Facebook 150 33
Google 1000 43
Microsoft 500 83
2019-01-02 Facebook 140 34
Google 950 53
Microsoft 400 93
2019-01-03 Facebook 135 35
Google 900 63
Microsoft 300 103
2019-01-04 Facebook 130 36
Google 800 73
Microsoft 200 113
In [801]:
df2=df.stack(level=0)
df2
Out[801]:
Company Facebook Google Microsoft
2019-01-01 Price 150 1000 500
Price to earning rate 33 43 83
2019-01-02 Price 140 950 400
Price to earning rate 34 53 93
2019-01-03 Price 135 900 300
Price to earning rate 35 63 103
2019-01-04 Price 130 800 200
Price to earning rate 36 73 113

11.38 Pandas Unstack

  • More Information
  • Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.
In [802]:
df2.unstack()
Out[802]:
Company Facebook Google Microsoft
Price Price to earning rate Price Price to earning rate Price Price to earning rate
2019-01-01 150 33 1000 43 500 83
2019-01-02 140 34 950 53 400 93
2019-01-03 135 35 900 63 300 103
2019-01-04 130 36 800 73 200 113

Example-2

In [803]:
df=pd.read_excel("Company1.xlsx",header=[0,1,2],index_col=[0])
df
Out[803]:
Price Ratio Income Statement
Price Price to earning rate Net Sales Net Profit
Company Facebook Google Microsoft Facebook Google Microsoft Facebook Google Microsoft Facebook Google Microsoft
2019-01-01 150 1000 500 33 43 83 150 1000 500 33 43 83
2019-01-02 140 950 400 34 53 93 140 950 400 34 53 93
2019-01-03 135 900 300 35 63 103 135 900 300 35 63 103
2019-01-04 130 800 200 36 73 113 130 800 200 36 73 113
In [804]:
df1=df.stack()
df1
Out[804]:
Income Statement Price Ratio
Net Profit Net Sales Price Price to earning rate
Company
2019-01-01 Facebook 33 150 150 33
Google 43 1000 1000 43
Microsoft 83 500 500 83
2019-01-02 Facebook 34 140 140 34
Google 53 950 950 53
Microsoft 93 400 400 93
2019-01-03 Facebook 35 135 135 35
Google 63 900 900 63
Microsoft 103 300 300 103
2019-01-04 Facebook 36 130 130 36
Google 73 800 800 73
Microsoft 113 200 200 113
In [805]:
df2=df.stack(level=0)
df2
Out[805]:
Net Profit Net Sales Price Price to earning rate
Company Facebook Google Microsoft Facebook Google Microsoft Facebook Google Microsoft Facebook Google Microsoft
2019-01-01 Income Statement 33.0 43.0 83.0 150.0 1000.0 500.0 NaN NaN NaN NaN NaN NaN
Price Ratio NaN NaN NaN NaN NaN NaN 150.0 1000.0 500.0 33.0 43.0 83.0
2019-01-02 Income Statement 34.0 53.0 93.0 140.0 950.0 400.0 NaN NaN NaN NaN NaN NaN
Price Ratio NaN NaN NaN NaN NaN NaN 140.0 950.0 400.0 34.0 53.0 93.0
2019-01-03 Income Statement 35.0 63.0 103.0 135.0 900.0 300.0 NaN NaN NaN NaN NaN NaN
Price Ratio NaN NaN NaN NaN NaN NaN 135.0 900.0 300.0 35.0 63.0 103.0
2019-01-04 Income Statement 36.0 73.0 113.0 130.0 800.0 200.0 NaN NaN NaN NaN NaN NaN
Price Ratio NaN NaN NaN NaN NaN NaN 130.0 800.0 200.0 36.0 73.0 113.0
In [806]:
df3=df.stack(level=1)
df3
Out[806]:
Income Statement Price Ratio
Company Facebook Google Microsoft Facebook Google Microsoft
2019-01-01 Net Profit 33.0 43.0 83.0 NaN NaN NaN
Net Sales 150.0 1000.0 500.0 NaN NaN NaN
Price NaN NaN NaN 150.0 1000.0 500.0
Price to earning rate NaN NaN NaN 33.0 43.0 83.0
2019-01-02 Net Profit 34.0 53.0 93.0 NaN NaN NaN
Net Sales 140.0 950.0 400.0 NaN NaN NaN
Price NaN NaN NaN 140.0 950.0 400.0
Price to earning rate NaN NaN NaN 34.0 53.0 93.0
2019-01-03 Net Profit 35.0 63.0 103.0 NaN NaN NaN
Net Sales 135.0 900.0 300.0 NaN NaN NaN
Price NaN NaN NaN 135.0 900.0 300.0
Price to earning rate NaN NaN NaN 35.0 63.0 103.0
2019-01-04 Net Profit 36.0 73.0 113.0 NaN NaN NaN
Net Sales 130.0 800.0 200.0 NaN NaN NaN
Price NaN NaN NaN 130.0 800.0 200.0
Price to earning rate NaN NaN NaN 36.0 73.0 113.0
In [807]:
df4=df.stack(level=2)
df4
Out[807]:
Income Statement Price Ratio
Net Profit Net Sales Price Price to earning rate
Company
2019-01-01 Facebook 33 150 150 33
Google 43 1000 1000 43
Microsoft 83 500 500 83
2019-01-02 Facebook 34 140 140 34
Google 53 950 950 53
Microsoft 93 400 400 93
2019-01-03 Facebook 35 135 135 35
Google 63 900 900 63
Microsoft 103 300 300 103
2019-01-04 Facebook 36 130 130 36
Google 73 800 800 73
Microsoft 113 200 200 113
  • by defalult level is most inner one.

11.39 Python Pandas replicate rows in dataframe

Example-1

In [808]:
df=pd.read_csv("rep.csv")
df
Out[808]:
Name Degree Rep
0 Umer MS DS 2
1 Ali BAA 3
2 Ahmed MS CS 1
3 Bilal MS EE 4
In [809]:
pd.concat([df]*3)
Out[809]:
Name Degree Rep
0 Umer MS DS 2
1 Ali BAA 3
2 Ahmed MS CS 1
3 Bilal MS EE 4
0 Umer MS DS 2
1 Ali BAA 3
2 Ahmed MS CS 1
3 Bilal MS EE 4
0 Umer MS DS 2
1 Ali BAA 3
2 Ahmed MS CS 1
3 Bilal MS EE 4
In [810]:
pd.concat([df]*3, ignore_index=True)
Out[810]:
Name Degree Rep
0 Umer MS DS 2
1 Ali BAA 3
2 Ahmed MS CS 1
3 Bilal MS EE 4
4 Umer MS DS 2
5 Ali BAA 3
6 Ahmed MS CS 1
7 Bilal MS EE 4
8 Umer MS DS 2
9 Ali BAA 3
10 Ahmed MS CS 1
11 Bilal MS EE 4

Example-2

In [811]:
rep2 = df.loc[np.repeat(df.index.values,df.Rep)].reset_index(drop=True)
rep2
Out[811]:
Name Degree Rep
0 Umer MS DS 2
1 Umer MS DS 2
2 Ali BAA 3
3 Ali BAA 3
4 Ali BAA 3
5 Ahmed MS CS 1
6 Bilal MS EE 4
7 Bilal MS EE 4
8 Bilal MS EE 4
9 Bilal MS EE 4
In [812]:
rep2 = rep2.drop("Rep",axis=1).reset_index(drop=True)
rep2
Out[812]:
Name Degree
0 Umer MS DS
1 Umer MS DS
2 Ali BAA
3 Ali BAA
4 Ali BAA
5 Ahmed MS CS
6 Bilal MS EE
7 Bilal MS EE
8 Bilal MS EE
9 Bilal MS EE

11.40 Data manipulations

In [813]:
df= pd.read_csv('maybe.csv',header=0)
df
Out[813]:
Paper Year
0 ABC 1980
1 DEF 1980
2 GHI 1980
3 ZBR 1981
4 CCC 1981
5 DDD 1982
6 HAL 1983
7 COR 1983
In [814]:
df['count'] = df.groupby('Year').cumcount() + 0
df
Out[814]:
Paper Year count
0 ABC 1980 0
1 DEF 1980 1
2 GHI 1980 2
3 ZBR 1981 0
4 CCC 1981 1
5 DDD 1982 0
6 HAL 1983 0
7 COR 1983 1

compound

  • Return the compound percentage of the values for the requested axis
  • More Information1
  • More Information2
  • So for columns a: [(0+1)(2+1)(4+1)]-1=[(1)(3)(5)]-1=15-1=14
  • So for columns b: [(1+1)(3+1)(5+1)]-1=[(2)(4)(6)]-1=48-1=47
In [815]:
df = pd.DataFrame(np.arange(6).reshape(3,2), columns=['a', 'b'])
df
Out[815]:
a b
0 0 1
1 2 3
2 4 5
In [816]:
df.compound()
Out[816]:
a    14
b    47
dtype: int32

cumprod

In [817]:
df = pd.Series([2, 10, np.nan, 4, 3, 0, 1])
df.cumprod()
Out[817]:
0      2.0
1     20.0
2      NaN
3     80.0
4    240.0
5      0.0
6      0.0
dtype: float64

cummax

  • Return cumulative maximum over a DataFrame or Series axis.
  • Returns a DataFrame or Series of the same size containing the cumulative maximum
  • More Information

Example-1

In [818]:
df = pd.Series([2, np.nan, 5, -1, 0])
df
Out[818]:
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64
  • By default, NA values are ignored.
In [819]:
df.cummax()
Out[819]:
0    2.0
1    NaN
2    5.0
3    5.0
4    5.0
dtype: float64
  • To include NA values in the operation, use skipna=False
In [820]:
df.cummax(skipna=False)
Out[820]:
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

Example-2

In [821]:
df = pd.DataFrame([[2.0, 1.0],
                   [3.0, np.nan],
                   [1.0, 0.0]],
                   columns=list('AB'))
df
Out[821]:
A B
0 2.0 1.0
1 3.0 NaN
2 1.0 0.0
  • By default, iterates over rows and finds the maximum in each column. This is equivalent to axis=None or axis='index'
In [822]:
df.cummax()
Out[822]:
A B
0 2.0 1.0
1 3.0 NaN
2 3.0 1.0
  • To iterate over columns and find the maximum in each row, use axis=1
In [823]:
df.cummax(axis=1)
Out[823]:
A B
0 2.0 2.0
1 3.0 NaN
2 1.0 1.0

cummin

  • Return cumulative minimum over a DataFrame or Series axis.
  • Returns a DataFrame or Series of the same size containing the cumulative minimum.
  • More Information

Example-1

In [824]:
df = pd.Series([2, np.nan, 5, -1, 0])
df
Out[824]:
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64
  • By default, NA values are ignored.
In [825]:
df.cummin()
Out[825]:
0    2.0
1    NaN
2    2.0
3   -1.0
4   -1.0
dtype: float64
  • To include NA values in the operation, use skipna=False
In [826]:
df.cummin(skipna=False)
Out[826]:
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

Example-2

In [827]:
df = pd.DataFrame([[2.0, 1.0],
                   [3.0, np.nan],
                   [1.0, 0.0]],
                   columns=list('AB'))
  • By default, iterates over rows and finds the maximum in each column. This is equivalent to axis=None or axis='index'
In [828]:
df.cummin()
Out[828]:
A B
0 2.0 1.0
1 2.0 NaN
2 1.0 0.0
  • To iterate over columns and find the maximum in each row, use axis=1.
In [829]:
df.cummin(axis=1)
Out[829]:
A B
0 2.0 1.0
1 3.0 NaN
2 1.0 0.0

cumsum

  • Return cumulative sum over a DataFrame or Series axis.
  • Returns a DataFrame or Series of the same size containing the cumulative sum.
  • More Information

Example-1

In [830]:
df = pd.Series([2, np.nan, 5, -1, 0])
df
Out[830]:
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64
  • By default, NA values are ignored.
In [831]:
df.cumsum()
Out[831]:
0    2.0
1    NaN
2    7.0
3    6.0
4    6.0
dtype: float64
  • To include NA values in the operation, use skipna=False
In [832]:
df.cumsum(skipna=False)
Out[832]:
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

Example-2

In [833]:
df = pd.DataFrame([[2.0, 1.0],
                   [3.0, np.nan],
                   [1.0, 0.0]],
                   columns=list('AB'))
df
Out[833]:
A B
0 2.0 1.0
1 3.0 NaN
2 1.0 0.0
  • By default, iterates over rows and finds the sum in each column. This is equivalent to axis=None or axis='index'.
In [834]:
df.cumsum()
Out[834]:
A B
0 2.0 1.0
1 5.0 NaN
2 6.0 1.0
  • To iterate over columns and find the sum in each row, use axis=1
In [835]:
df.cumsum(axis=1)
Out[835]:
A B
0 2.0 3.0
1 3.0 NaN
2 1.0 1.0

Example-3

In [836]:
df= pd.read_csv('maybe.csv',header=0)
df
Out[836]:
Paper Year
0 ABC 1980
1 DEF 1980
2 GHI 1980
3 ZBR 1981
4 CCC 1981
5 DDD 1982
6 HAL 1983
7 COR 1983
In [837]:
df.iat[2, 1]=1983
In [838]:
df
Out[838]:
Paper Year
0 ABC 1980
1 DEF 1980
2 GHI 1983
3 ZBR 1981
4 CCC 1981
5 DDD 1982
6 HAL 1983
7 COR 1983
In [839]:
df['GD'] = (df['Year'] != df['Year'].shift(1)).astype(int).cumsum()
In [840]:
df['count'] = df.groupby('GD').cumcount() + 0
df
Out[840]:
Paper Year GD count
0 ABC 1980 1 0
1 DEF 1980 1 1
2 GHI 1983 2 0
3 ZBR 1981 3 0
4 CCC 1981 3 1
5 DDD 1982 4 0
6 HAL 1983 5 0
7 COR 1983 5 1

expanding

In [841]:
df = pd.DataFrame({'B': [0, 3, 2, 7, 4]})
df
Out[841]:
B
0 0
1 3
2 2
3 7
4 4
In [842]:
df.expanding(2).sum()
Out[842]:
B
0 NaN
1 3.0
2 5.0
3 12.0
4 16.0

rolling

In [843]:
df = pd.DataFrame({'B': [0, 3, 2, 7, 4]})
df
Out[843]:
B
0 0
1 3
2 2
3 7
4 4
In [844]:
df.rolling(2).sum()
Out[844]:
B
0 NaN
1 3.0
2 5.0
3 9.0
4 11.0

pct_change

  • More Information
  • Percentage change between the current and a prior element.
  • Computes the percentage change from the immediately previous row by default.
  • This is useful in comparing the percentage of change in a time series of elements.
In [845]:
df= pd.read_csv('AAPL.csv',parse_dates=['Date'])
df
Out[845]:
Date Open High Low Close Adj Close Volume
0 2018-01-02 170.160004 172.300003 169.259995 172.259995 169.712067 25555900
1 2018-01-03 172.529999 174.550003 171.960007 172.229996 169.682510 29517900
2 2018-01-04 172.539993 173.470001 172.080002 173.029999 170.470703 22434600
3 2018-01-05 173.440002 175.369995 173.050003 175.000000 172.411560 23660000
4 2018-01-08 174.350006 175.610001 173.929993 174.350006 171.771179 20567800
5 2018-01-09 174.550003 175.059998 173.410004 174.330002 171.751465 21584000
6 2018-01-10 173.160004 174.300003 173.000000 174.289993 171.712051 23959900
7 2018-01-11 174.589996 175.490005 174.490005 175.279999 172.687408 18667700
8 2018-01-12 176.179993 177.360001 175.649994 177.089996 174.470642 25226000
9 2018-01-15 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
10 2018-01-16 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
11 2018-01-17 176.149994 179.250000 175.070007 179.100006 176.450928 34386800
12 2018-01-18 179.369995 180.100006 178.250000 179.259995 176.608551 31193400
13 2018-01-19 178.610001 179.580002 177.410004 178.460007 175.820389 32425100
14 2018-01-22 177.300003 177.779999 176.600006 177.000000 174.381973 27108600
15 2018-01-23 177.300003 179.440002 176.820007 177.039993 174.421387 32689100
16 2018-01-24 177.250000 177.300003 173.199997 174.220001 171.643082 51105100
17 2018-01-25 174.509995 174.949997 170.529999 171.110001 168.579086 41529000
18 2018-01-26 172.000000 172.000000 170.059998 171.509995 168.973175 39143000
19 2018-01-29 170.160004 170.160004 167.070007 167.960007 165.475677 50640400
20 2018-01-30 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
21 2018-01-31 166.869995 168.440002 166.500000 167.429993 164.953522 32478900
22 2018-02-01 167.169998 168.619995 166.759995 167.779999 165.298355 47230800
23 2018-02-02 166.000000 166.800003 160.100006 160.500000 158.126022 86593800
24 2018-02-05 159.100006 163.880005 156.000000 156.490005 154.175354 72738500
25 2018-02-06 154.830002 163.720001 154.000000 163.029999 160.618591 68243800
26 2018-02-07 163.089996 163.399994 159.070007 159.539993 157.180222 51608600
27 2018-02-08 160.289993 161.000000 155.029999 155.149994 152.855148 54390500
28 2018-02-09 157.070007 157.889999 150.240005 156.410004 154.724808 70672600
29 2018-02-12 158.500000 163.889999 157.509995 162.710007 160.956924 60819500
... ... ... ... ... ... ... ...
207 2018-10-25 217.710007 221.380005 216.750000 219.800003 219.035751 29855800
208 2018-10-26 215.899994 220.190002 212.669998 216.300003 215.547913 47258400
209 2018-10-29 219.190002 219.690002 206.089996 212.240005 211.502045 45935500
210 2018-10-30 211.149994 215.179993 209.270004 213.300003 212.558350 36660000
211 2018-10-31 216.880005 220.449997 216.619995 218.860001 218.099014 38358900
212 2018-11-01 219.050003 222.360001 216.809998 222.220001 221.447327 58323200
213 2018-11-02 209.550003 213.649994 205.429993 207.479996 206.758575 91328700
214 2018-11-05 204.300003 204.389999 198.169998 201.589996 200.889053 66163700
215 2018-11-06 201.919998 204.720001 201.690002 203.770004 203.061493 31882900
216 2018-11-07 205.970001 210.059998 204.130005 209.949997 209.219986 33424400
217 2018-11-08 209.979996 210.119995 206.750000 208.490005 208.490005 25362600
218 2018-11-09 205.550003 206.009995 202.250000 204.470001 204.470001 34365800
219 2018-11-12 199.000000 199.850006 193.789993 194.169998 194.169998 51135500
220 2018-11-13 191.630005 197.179993 191.449997 192.229996 192.229996 46882900
221 2018-11-14 193.899994 194.479996 185.929993 186.800003 186.800003 60801000
222 2018-11-15 188.389999 191.970001 186.899994 191.410004 191.410004 46478800
223 2018-11-16 190.500000 194.970001 189.460007 193.529999 193.529999 36928300
224 2018-11-19 190.000000 190.699997 184.990005 185.860001 185.860001 41925300
225 2018-11-20 178.369995 181.470001 175.509995 176.979996 176.979996 67825200
226 2018-11-21 179.729996 180.270004 176.550003 176.779999 176.779999 31124200
227 2018-11-23 174.940002 176.600006 172.100006 172.289993 172.289993 23624000
228 2018-11-26 174.240005 174.949997 170.259995 174.619995 174.619995 44738600
229 2018-11-27 171.509995 174.770004 170.880005 174.240005 174.240005 41387400
230 2018-11-28 176.729996 181.289993 174.929993 180.940002 180.940002 46062500
231 2018-11-29 182.660004 182.800003 177.699997 179.550003 179.550003 41770000
232 2018-11-30 180.289993 180.330002 177.029999 178.580002 178.580002 39531500
233 2018-12-03 184.460007 184.940002 181.210007 184.820007 184.820007 40802500
234 2018-12-04 180.949997 182.389999 176.270004 176.690002 176.690002 41344300
235 2018-12-06 171.759995 174.779999 170.419998 174.720001 174.720001 43098400
236 2018-12-07 173.490005 174.490005 168.300003 168.490005 168.490005 41695700

237 rows × 7 columns

In [846]:
df['dchange']=100*df['Close'].pct_change()
df
Out[846]:
Date Open High Low Close Adj Close Volume dchange
0 2018-01-02 170.160004 172.300003 169.259995 172.259995 169.712067 25555900 NaN
1 2018-01-03 172.529999 174.550003 171.960007 172.229996 169.682510 29517900 -0.017415
2 2018-01-04 172.539993 173.470001 172.080002 173.029999 170.470703 22434600 0.464497
3 2018-01-05 173.440002 175.369995 173.050003 175.000000 172.411560 23660000 1.138531
4 2018-01-08 174.350006 175.610001 173.929993 174.350006 171.771179 20567800 -0.371425
5 2018-01-09 174.550003 175.059998 173.410004 174.330002 171.751465 21584000 -0.011473
6 2018-01-10 173.160004 174.300003 173.000000 174.289993 171.712051 23959900 -0.022950
7 2018-01-11 174.589996 175.490005 174.490005 175.279999 172.687408 18667700 0.568022
8 2018-01-12 176.179993 177.360001 175.649994 177.089996 174.470642 25226000 1.032632
9 2018-01-15 177.899994 179.389999 176.139999 176.190002 173.583969 29565900 -0.508213
10 2018-01-16 177.899994 179.389999 176.139999 176.190002 173.583969 29565900 0.000000
11 2018-01-17 176.149994 179.250000 175.070007 179.100006 176.450928 34386800 1.651628
12 2018-01-18 179.369995 180.100006 178.250000 179.259995 176.608551 31193400 0.089329
13 2018-01-19 178.610001 179.580002 177.410004 178.460007 175.820389 32425100 -0.446272
14 2018-01-22 177.300003 177.779999 176.600006 177.000000 174.381973 27108600 -0.818114
15 2018-01-23 177.300003 179.440002 176.820007 177.039993 174.421387 32689100 0.022595
16 2018-01-24 177.250000 177.300003 173.199997 174.220001 171.643082 51105100 -1.592856
17 2018-01-25 174.509995 174.949997 170.529999 171.110001 168.579086 41529000 -1.785099
18 2018-01-26 172.000000 172.000000 170.059998 171.509995 168.973175 39143000 0.233764
19 2018-01-29 170.160004 170.160004 167.070007 167.960007 165.475677 50640400 -2.069843
20 2018-01-30 165.529999 167.369995 164.699997 166.970001 164.500336 46048200 -0.589430
21 2018-01-31 166.869995 168.440002 166.500000 167.429993 164.953522 32478900 0.275494
22 2018-02-01 167.169998 168.619995 166.759995 167.779999 165.298355 47230800 0.209046
23 2018-02-02 166.000000 166.800003 160.100006 160.500000 158.126022 86593800 -4.339015
24 2018-02-05 159.100006 163.880005 156.000000 156.490005 154.175354 72738500 -2.498439
25 2018-02-06 154.830002 163.720001 154.000000 163.029999 160.618591 68243800 4.179177
26 2018-02-07 163.089996 163.399994 159.070007 159.539993 157.180222 51608600 -2.140714
27 2018-02-08 160.289993 161.000000 155.029999 155.149994 152.855148 54390500 -2.751661
28 2018-02-09 157.070007 157.889999 150.240005 156.410004 154.724808 70672600 0.812124
29 2018-02-12 158.500000 163.889999 157.509995 162.710007 160.956924 60819500 4.027877
... ... ... ... ... ... ... ... ...
207 2018-10-25 217.710007 221.380005 216.750000 219.800003 219.035751 29855800 2.189784
208 2018-10-26 215.899994 220.190002 212.669998 216.300003 215.547913 47258400 -1.592357
209 2018-10-29 219.190002 219.690002 206.089996 212.240005 211.502045 45935500 -1.877022
210 2018-10-30 211.149994 215.179993 209.270004 213.300003 212.558350 36660000 0.499434
211 2018-10-31 216.880005 220.449997 216.619995 218.860001 218.099014 38358900 2.606656
212 2018-11-01 219.050003 222.360001 216.809998 222.220001 221.447327 58323200 1.535228
213 2018-11-02 209.550003 213.649994 205.429993 207.479996 206.758575 91328700 -6.633069
214 2018-11-05 204.300003 204.389999 198.169998 201.589996 200.889053 66163700 -2.838828
215 2018-11-06 201.919998 204.720001 201.690002 203.770004 203.061493 31882900 1.081407
216 2018-11-07 205.970001 210.059998 204.130005 209.949997 209.219986 33424400 3.032828
217 2018-11-08 209.979996 210.119995 206.750000 208.490005 208.490005 25362600 -0.695400
218 2018-11-09 205.550003 206.009995 202.250000 204.470001 204.470001 34365800 -1.928152
219 2018-11-12 199.000000 199.850006 193.789993 194.169998 194.169998 51135500 -5.037415
220 2018-11-13 191.630005 197.179993 191.449997 192.229996 192.229996 46882900 -0.999126
221 2018-11-14 193.899994 194.479996 185.929993 186.800003 186.800003 60801000 -2.824738
222 2018-11-15 188.389999 191.970001 186.899994 191.410004 191.410004 46478800 2.467881
223 2018-11-16 190.500000 194.970001 189.460007 193.529999 193.529999 36928300 1.107568
224 2018-11-19 190.000000 190.699997 184.990005 185.860001 185.860001 41925300 -3.963209
225 2018-11-20 178.369995 181.470001 175.509995 176.979996 176.979996 67825200 -4.777792
226 2018-11-21 179.729996 180.270004 176.550003 176.779999 176.779999 31124200 -0.113005
227 2018-11-23 174.940002 176.600006 172.100006 172.289993 172.289993 23624000 -2.539883
228 2018-11-26 174.240005 174.949997 170.259995 174.619995 174.619995 44738600 1.352372
229 2018-11-27 171.509995 174.770004 170.880005 174.240005 174.240005 41387400 -0.217610
230 2018-11-28 176.729996 181.289993 174.929993 180.940002 180.940002 46062500 3.845269
231 2018-11-29 182.660004 182.800003 177.699997 179.550003 179.550003 41770000 -0.768210
232 2018-11-30 180.289993 180.330002 177.029999 178.580002 178.580002 39531500 -0.540240
233 2018-12-03 184.460007 184.940002 181.210007 184.820007 184.820007 40802500 3.494235
234 2018-12-04 180.949997 182.389999 176.270004 176.690002 176.690002 41344300 -4.398877
235 2018-12-06 171.759995 174.779999 170.419998 174.720001 174.720001 43098400 -1.114948
236 2018-12-07 173.490005 174.490005 168.300003 168.490005 168.490005 41695700 -3.565703

237 rows × 8 columns

11.41 Data Frame Formatting in Pandas

In [847]:
from IPython.display import YouTubeVideo
YouTubeVideo('yiO43TQ4xvc',width=900, height=500)
Out[847]:

Display Rows Method in Pandas

In [848]:
df_who= pd.read_csv('WHO_csv.csv')
pd.get_option('display.max_rows')
Out[848]:
60

Example-1

In [849]:
df_who= pd.read_csv('WHO_csv.csv')
pd.set_option('display.max_rows',None)
df_who
Out[849]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
6 Argentina Americas 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN
7 Armenia Europe 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN
8 Australia Western Pacific 23050 18.95 19.46 1.89 82 4.9 108.34 NaN 38110.0 96.9 97.5
9 Austria Europe 8464 14.51 23.52 1.44 81 4.0 154.78 NaN 42050.0 NaN NaN
10 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
12 Bahrain Eastern Mediterranean 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
13 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
15 Belarus Europe 9405 15.10 19.31 1.47 71 5.2 111.88 NaN 14460.0 NaN NaN
16 Belgium Europe 11060 16.88 23.81 1.85 80 4.2 116.61 NaN 39190.0 98.9 99.2
17 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
18 Benin Africa 10051 42.95 4.54 5.01 57 89.5 85.33 42.4 1620.0 NaN NaN
19 Bhutan South-East Asia 742 28.53 6.90 2.32 67 44.6 65.58 NaN 5570.0 88.3 91.5
20 Bolivia (Plurinational State of) Americas 10496 35.23 7.28 3.31 67 41.4 82.82 NaN 4890.0 91.2 91.5
21 Bosnia and Herzegovina Europe 3834 16.35 20.52 1.26 76 6.7 84.52 97.9 9190.0 86.5 88.4
22 Botswana Africa 2004 33.75 5.63 2.71 66 53.3 142.82 84.5 14550.0 NaN NaN
23 Brazil Americas 199000 24.56 10.81 1.82 74 14.4 124.26 NaN 11420.0 NaN NaN
24 Brunei Darussalam Western Pacific 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
25 Bulgaria Europe 7278 13.53 26.11 1.51 74 12.1 140.68 NaN 14160.0 99.3 99.7
26 Burkina Faso Africa 16460 45.66 3.88 5.78 56 102.4 45.27 NaN 1300.0 60.7 55.9
27 Burundi Africa 9850 44.20 3.87 6.21 53 104.3 22.33 67.2 610.0 NaN NaN
28 Cambodia Western Pacific 14865 31.23 7.67 2.93 65 39.7 96.17 NaN 2230.0 96.4 95.4
29 Cameroon Africa 21700 43.08 4.89 4.94 53 94.9 52.35 NaN 2330.0 99.6 87.4
30 Canada Americas 34838 16.37 20.82 1.66 82 5.3 79.73 NaN 39660.0 NaN NaN
31 Cape Verde Africa 494 30.17 7.05 2.38 72 22.2 79.19 84.3 3980.0 94.6 92.4
32 Central African Republic Africa 4525 40.07 5.74 4.54 48 128.6 40.65 56.0 810.0 81.3 60.6
33 Chad Africa 12448 48.52 3.80 6.49 51 149.8 31.80 34.5 1360.0 NaN NaN
34 Chile Americas 17465 21.38 13.80 1.84 79 9.1 129.71 NaN 16330.0 94.3 94.4
35 China Western Pacific 1390000 17.95 13.42 1.66 76 14.0 73.19 94.3 8390.0 NaN NaN
36 Colombia Americas 47704 28.03 9.19 2.35 78 17.6 98.45 93.4 9560.0 91.7 91.3
37 Comoros Africa 718 42.17 4.50 4.85 62 77.6 28.71 74.9 1110.0 NaN NaN
38 Congo Africa 4337 42.37 5.13 5.05 58 96.0 93.84 NaN 3240.0 92.3 89.3
39 Cook Islands Western Pacific 21 30.61 9.07 NaN 77 10.6 NaN NaN NaN 97.6 99.3
40 Costa Rica Americas 4805 23.94 10.15 1.83 79 9.9 92.20 96.2 11860.0 NaN NaN
41 Ivory Coast Africa 19840 41.48 5.10 4.91 56 107.6 86.06 56.2 1710.0 NaN NaN
42 Croatia Europe 4307 14.98 24.69 1.48 77 4.7 116.37 98.8 18760.0 94.8 97.0
43 Cuba Americas 11271 16.58 17.95 1.46 78 5.5 11.69 99.8 NaN 100.0 99.7
44 Cyprus Europe 1129 17.16 16.92 1.47 81 3.2 97.71 98.3 NaN 99.1 99.5
45 Czech Republic Europe 10660 14.56 23.23 1.53 78 3.8 123.44 NaN 24370.0 NaN NaN
46 Democratic People's Republic of Korea South-East Asia 24763 21.98 12.74 2.00 69 28.8 4.09 NaN NaN NaN NaN
47 Democratic Republic of the Congo Africa 65705 45.11 4.51 6.15 49 145.7 23.09 66.8 340.0 NaN NaN
48 Denmark Europe 5598 17.66 23.90 1.88 79 3.7 128.47 NaN 41900.0 94.8 96.9
49 Djibouti Eastern Mediterranean 860 33.72 5.96 3.53 58 80.9 21.32 NaN NaN NaN NaN
50 Dominica Americas 72 25.96 12.35 NaN 74 12.6 164.02 NaN 13000.0 NaN NaN
51 Dominican Republic Americas 10277 30.53 8.97 2.55 73 27.1 87.22 89.5 9420.0 95.5 90.4
52 Ecuador Americas 15492 30.29 9.21 2.62 76 23.3 104.55 91.9 8510.0 NaN NaN
53 Egypt Eastern Mediterranean 80722 31.25 8.62 2.85 73 21.0 101.08 72.0 6120.0 NaN NaN
54 El Salvador Americas 6297 30.62 9.64 2.24 72 15.9 133.54 84.5 6640.0 95.2 95.5
55 Equatorial Guinea Africa 736 38.95 4.53 5.04 54 100.3 59.15 93.9 25620.0 56.5 56.0
56 Eritrea Africa 6131 43.10 3.73 4.88 61 51.8 4.47 67.8 580.0 37.2 32.5
57 Estonia Europe 1291 15.69 23.92 1.62 76 3.6 138.98 99.8 20850.0 97.7 97.0
58 Ethiopia Africa 91729 43.29 5.17 4.77 60 68.3 16.67 NaN 1110.0 84.8 79.5
59 Fiji Western Pacific 875 28.88 8.38 2.64 70 22.4 83.72 NaN 4610.0 NaN NaN
60 Finland Europe 5408 16.42 25.90 1.85 81 2.9 166.02 NaN 37670.0 97.7 97.9
61 France Europe 63937 18.26 23.82 1.98 82 4.1 94.79 NaN 35910.0 99.1 99.3
62 Gabon Africa 1633 38.49 7.38 4.18 62 62.0 117.32 88.4 13740.0 NaN NaN
63 Gambia Africa 1791 45.90 3.72 5.79 58 72.9 78.89 50.0 1750.0 68.2 70.4
64 Georgia Europe 4358 17.62 19.47 1.82 72 19.9 102.31 99.7 5350.0 NaN NaN
65 Germany Europe 82800 13.17 26.72 1.40 81 4.1 132.30 NaN 40230.0 NaN NaN
66 Ghana Africa 25366 38.59 5.40 3.99 64 72.0 84.78 67.3 1810.0 NaN NaN
67 Greece Europe 11125 14.60 25.41 1.51 81 4.8 106.48 97.2 25100.0 98.8 99.3
68 Grenada Americas 105 26.96 9.72 2.22 74 13.5 NaN NaN 10350.0 NaN NaN
69 Guatemala Americas 15083 40.80 6.56 3.91 69 32.0 140.38 75.2 4760.0 98.6 97.5
70 Guinea Africa 11451 42.46 5.03 5.09 55 101.2 44.02 41.0 1020.0 85.2 72.1
71 Guinea-Bissau Africa 1664 41.55 5.06 5.05 50 129.1 56.18 54.2 1240.0 76.7 73.3
72 Guyana Americas 795 36.77 5.18 2.64 63 35.2 69.94 NaN NaN 82.4 85.9
73 Haiti Americas 10174 35.35 6.70 3.28 63 75.6 41.49 NaN 1180.0 NaN NaN
74 Honduras Americas 7936 35.72 6.41 3.10 74 22.9 103.97 84.8 3820.0 94.8 97.0
75 Hungary Europe 9976 14.62 23.41 1.38 75 6.2 117.30 99.0 20310.0 97.8 98.3
76 Iceland Europe 326 20.71 17.62 2.11 82 2.3 106.08 NaN 31020.0 98.8 99.2
77 India South-East Asia 1240000 29.43 8.10 2.53 65 56.3 72.00 NaN 3590.0 NaN NaN
78 Indonesia South-East Asia 247000 29.27 7.86 2.40 69 31.0 103.09 NaN 4500.0 NaN NaN
79 Iran (Islamic Republic of) Eastern Mediterranean 76424 23.68 7.82 1.91 73 17.6 74.93 NaN NaN NaN NaN
80 Iraq Eastern Mediterranean 32778 40.51 4.95 4.15 69 34.4 78.12 78.2 3750.0 NaN NaN
81 Ireland Europe 4576 21.54 16.59 2.00 81 4.0 108.41 NaN 34180.0 99.4 100.0
82 Israel Europe 7644 27.53 15.15 2.92 82 4.2 121.66 NaN 27110.0 97.0 97.8
83 Italy Europe 60885 14.04 26.97 1.45 82 3.8 157.93 98.9 32400.0 99.6 98.5
84 Jamaica Americas 2769 27.78 10.98 2.31 75 16.8 108.12 86.6 NaN 83.4 81.4
85 Japan Western Pacific 127000 13.12 31.92 1.39 83 3.0 104.95 NaN 35330.0 NaN NaN
86 Jordan Eastern Mediterranean 7009 34.13 5.30 3.39 74 19.1 118.20 92.6 5930.0 90.8 90.7
87 Kazakhstan Europe 16271 25.46 10.04 2.52 67 18.7 155.74 99.7 11250.0 NaN NaN
88 Kenya Africa 43178 42.37 4.25 4.54 60 72.9 67.49 87.4 1710.0 NaN NaN
89 Kiribati Western Pacific 101 30.10 8.84 3.01 67 59.9 13.64 NaN 3300.0 NaN NaN
90 Kuwait Eastern Mediterranean 3250 24.90 3.80 2.65 80 11.0 175.09 NaN NaN NaN NaN
91 Kyrgyzstan Europe 5474 30.21 6.34 3.03 69 26.6 116.40 NaN 2180.0 95.5 95.1
92 Lao People's Democratic Republic Western Pacific 6646 35.61 5.76 3.20 68 71.8 87.16 NaN 2580.0 98.1 95.4
93 Latvia Europe 2060 14.57 24.24 1.57 74 8.7 102.94 99.8 17700.0 95.0 96.8
94 Lebanon Eastern Mediterranean 4647 21.64 12.03 1.50 74 9.3 78.65 NaN 14470.0 93.5 92.9
95 Lesotho Africa 2052 36.75 6.31 3.15 50 99.6 56.17 89.6 2050.0 72.2 75.3
96 Liberia Africa 4190 43.06 4.76 4.95 59 74.8 49.17 60.8 540.0 NaN NaN
97 Libya Eastern Mediterranean 6155 29.45 6.96 2.47 65 15.4 155.70 89.2 NaN NaN NaN
98 Lithuania Europe 3028 15.13 20.57 1.49 74 5.4 151.30 99.7 19640.0 95.6 95.8
99 Luxembourg Europe 524 17.46 19.15 1.65 82 2.2 148.27 NaN 64260.0 93.6 95.7
100 Madagascar Africa 22294 42.72 4.45 4.59 66 58.2 40.65 NaN 950.0 NaN NaN
101 Malawi Africa 15906 45.44 4.92 5.55 58 71.0 25.69 74.8 870.0 NaN NaN
102 Malaysia Western Pacific 29240 26.65 8.21 1.99 74 8.5 127.04 93.1 15650.0 NaN NaN
103 Maldives South-East Asia 338 29.03 6.65 2.31 77 10.5 165.72 NaN 7430.0 96.5 96.5
104 Mali Africa 14854 47.14 4.29 6.85 51 128.0 68.32 31.1 1040.0 70.6 60.8
105 Malta Europe 428 14.98 22.87 1.37 80 6.8 124.86 NaN NaN 93.3 94.3
106 Marshall Islands Western Pacific 53 30.10 8.84 NaN 60 37.9 NaN NaN NaN NaN NaN
107 Mauritania Africa 3796 40.22 4.94 4.78 59 84.0 93.60 58.0 2400.0 72.8 76.0
108 Mauritius Africa 1240 20.17 13.23 1.51 74 15.1 99.04 88.5 14330.0 NaN NaN
109 Mexico Americas 121000 29.02 9.18 2.25 75 16.2 82.38 93.1 15390.0 99.2 99.9
110 Micronesia (Federated States of) Western Pacific 103 35.81 6.67 3.40 69 38.5 NaN NaN 3580.0 NaN NaN
111 Monaco Europe 38 18.26 23.82 NaN 82 3.8 89.73 NaN NaN NaN NaN
112 Mongolia Western Pacific 2796 27.05 5.80 2.45 68 27.5 105.08 97.4 4290.0 99.6 98.5
113 Montenegro Europe 621 19.01 18.58 1.69 76 5.9 NaN 98.4 13700.0 NaN NaN
114 Morocco Eastern Mediterranean 32521 27.85 7.61 2.65 72 31.1 113.26 NaN 4880.0 NaN NaN
115 Mozambique Africa 25203 45.38 5.01 5.34 53 89.7 32.83 56.1 970.0 94.6 89.4
116 Myanmar South-East Asia 52797 25.28 8.15 1.98 65 52.3 2.57 92.3 NaN NaN NaN
117 Namibia Africa 2259 36.59 5.38 3.17 65 38.7 96.39 88.8 6560.0 83.8 88.5
118 Nauru Western Pacific 10 30.10 8.84 NaN 71 37.1 65.00 NaN NaN NaN NaN
119 Nepal South-East Asia 27474 35.58 7.65 2.50 68 41.6 43.81 60.3 1260.0 NaN NaN
120 Netherlands Europe 16714 17.21 23.02 1.76 81 4.1 NaN NaN 43140.0 NaN NaN
121 New Zealand Western Pacific 4460 20.26 19.01 2.10 81 5.7 109.19 NaN NaN 99.3 99.6
122 Nicaragua Americas 5992 33.37 6.59 2.59 73 24.4 82.15 NaN 3730.0 93.2 94.5
123 Niger Africa 17157 49.99 4.26 7.58 56 113.5 29.52 NaN 720.0 64.2 52.0
124 Nigeria Africa 169000 44.23 4.49 6.02 53 123.7 58.58 61.3 2290.0 60.1 54.8
125 Niue Western Pacific 1 30.61 9.07 NaN 72 25.1 NaN NaN NaN NaN NaN
126 Norway Europe 4994 18.64 21.41 1.93 81 2.8 115.62 NaN 61460.0 99.1 99.2
127 Oman Eastern Mediterranean 3314 24.19 3.99 2.90 72 11.6 168.97 NaN NaN NaN NaN
128 Pakistan Eastern Mediterranean 179000 34.31 6.44 3.35 67 85.9 61.61 NaN 2870.0 81.3 66.5
129 Palau Western Pacific 21 30.10 8.84 NaN 72 20.8 74.94 NaN 11080.0 NaN NaN
130 Panama Americas 3802 28.65 10.13 2.52 77 18.5 188.60 94.1 14510.0 99.1 98.2
131 Papua New Guinea Western Pacific 7167 38.37 4.79 3.90 63 63.0 34.22 60.6 2570.0 NaN NaN
132 Paraguay Americas 6687 32.78 8.01 2.93 75 22.0 99.40 93.9 5390.0 84.4 83.9
133 Peru Americas 29988 29.18 9.12 2.48 77 18.2 110.41 NaN 9440.0 97.8 98.5
134 Philippines Western Pacific 96707 34.53 6.21 3.11 69 29.8 99.30 NaN 4140.0 NaN NaN
135 Poland Europe 38211 14.91 20.48 1.39 76 5.0 130.97 99.5 20430.0 96.9 96.7
136 Portugal Europe 10604 14.92 24.39 1.33 80 3.6 115.39 95.2 24440.0 99.1 99.7
137 Qatar Eastern Mediterranean 2051 13.28 1.73 2.06 82 7.4 123.11 96.3 86440.0 95.7 96.6
138 Republic of Korea Western Pacific 49003 15.25 16.58 1.29 81 3.8 108.50 NaN 30370.0 99.3 98.4
139 Republic of Moldova Europe 3514 16.52 16.72 1.47 71 17.6 104.80 98.5 3640.0 90.1 90.1
140 Romania Europe 21755 15.05 20.66 1.39 74 12.2 109.16 97.7 15120.0 87.9 87.3
141 Russian Federation Europe 143000 15.45 18.60 1.51 69 10.3 179.31 99.6 20560.0 NaN NaN
142 Rwanda Africa 11458 43.56 3.94 4.73 60 55.0 40.63 71.1 1270.0 NaN NaN
143 Saint Kitts and Nevis Americas 54 25.96 12.35 NaN 74 9.2 NaN NaN 16470.0 85.8 86.2
144 Saint Lucia Americas 181 24.31 12.13 1.96 75 17.5 123.00 NaN 11220.0 90.2 89.2
145 Saint Vincent and the Grenadines Americas 109 25.70 9.92 2.05 74 23.4 120.52 NaN 10440.0 NaN NaN
146 Samoa Western Pacific 189 37.88 7.39 4.28 73 17.8 NaN 98.8 4270.0 93.2 97.1
147 San Marino Europe 31 14.04 26.97 NaN 83 3.3 111.75 NaN NaN NaN NaN
148 Sao Tome and Principe Africa 188 41.60 4.76 4.22 63 53.2 68.26 89.2 2080.0 NaN NaN
149 Saudi Arabia Eastern Mediterranean 28288 29.69 4.59 2.76 76 8.6 191.24 86.6 24700.0 96.7 96.5
150 Senegal Africa 13726 43.54 4.57 5.02 61 59.6 73.25 NaN 1940.0 75.9 80.2
151 Serbia Europe 9553 16.45 20.52 1.37 74 6.6 125.39 97.9 11540.0 94.7 94.4
152 Seychelles Africa 92 21.95 10.05 2.23 74 13.1 145.71 91.8 25140.0 NaN NaN
153 Sierra Leone Africa 5979 41.74 4.41 4.86 47 181.6 35.63 42.1 840.0 NaN NaN
154 Singapore Western Pacific 5303 16.48 15.13 1.27 82 2.9 150.24 95.9 59380.0 NaN NaN
155 Slovakia Europe 5446 15.00 18.60 1.37 76 7.5 109.35 NaN 22130.0 NaN NaN
156 Slovenia Europe 2068 14.16 23.16 1.49 80 3.1 106.56 99.7 26510.0 97.7 97.3
157 Solomon Islands Western Pacific 550 40.37 5.10 4.17 70 31.1 49.77 NaN 2350.0 87.7 87.3
158 Somalia Eastern Mediterranean 10195 47.35 4.46 6.77 50 147.4 6.85 NaN NaN NaN NaN
159 South Africa Africa 52386 29.53 8.44 2.44 58 44.6 126.83 NaN 10710.0 NaN NaN
160 South Sudan Eastern Mediterranean 10838 42.28 5.26 5.10 54 104.0 NaN NaN NaN NaN NaN
161 Spain Europe 46755 15.20 22.86 1.47 82 4.5 113.22 97.7 31400.0 99.7 99.8
162 Sri Lanka South-East Asia 21098 25.15 12.40 2.35 75 9.6 87.05 91.2 5520.0 93.9 94.4
163 Sudan Eastern Mediterranean 37195 41.48 4.99 4.56 62 73.1 56.14 71.1 2120.0 NaN NaN
164 Suriname Americas 535 27.83 9.55 2.32 72 20.8 178.88 94.7 NaN NaN NaN
165 Swaziland Africa 1231 38.05 5.34 3.48 50 79.7 63.70 87.4 5930.0 NaN NaN
166 Sweden Europe 9511 16.71 25.32 1.93 82 2.9 118.57 NaN 42200.0 99.7 99.0
167 Switzerland Europe 7997 14.79 23.25 1.51 83 4.3 131.43 NaN 52570.0 98.9 99.5
168 Syrian Arab Republic Eastern Mediterranean 21890 35.35 6.09 3.04 75 15.1 63.17 83.4 NaN NaN NaN
169 Tajikistan Europe 8009 35.75 4.80 3.81 68 58.3 90.64 99.7 2300.0 99.5 96.0
170 Thailand South-East Asia 66785 18.47 13.96 1.43 74 13.2 111.63 NaN 8360.0 NaN NaN
171 The former Yugoslav Republic of Macedonia Europe 2106 16.89 17.56 1.44 75 7.4 107.24 97.3 11090.0 97.3 99.2
172 Timor-Leste South-East Asia 1114 46.33 5.16 6.11 64 56.7 53.23 58.3 NaN 86.2 85.6
173 Togo Africa 6643 41.89 4.44 4.75 56 95.5 50.45 NaN 1040.0 NaN NaN
174 Tonga Western Pacific 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
175 Trinidad and Tobago Americas 1337 20.73 13.18 1.80 71 20.7 135.64 98.8 NaN 97.7 97.0
176 Tunisia Eastern Mediterranean 10875 23.22 10.49 2.04 76 16.1 116.93 NaN 9030.0 NaN NaN
177 Turkey Europe 73997 26.00 10.56 2.08 76 14.2 88.70 NaN 16940.0 99.5 98.3
178 Turkmenistan Europe 5173 28.65 6.30 2.38 63 52.8 68.77 99.6 8690.0 NaN NaN
179 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
180 Uganda Africa 36346 48.54 3.72 6.06 56 68.9 48.38 73.2 1310.0 89.7 92.3
181 Ukraine Europe 45530 14.18 20.76 1.45 71 10.7 122.98 99.7 7040.0 90.8 91.5
182 United Arab Emirates Eastern Mediterranean 9206 14.41 0.81 1.84 76 8.4 148.62 NaN 47890.0 NaN NaN
183 United Kingdom Europe 62783 17.54 23.06 1.90 80 4.8 130.75 NaN 36010.0 99.8 99.6
184 United Republic of Tanzania Africa 47783 44.85 4.89 5.36 59 54.0 55.53 73.2 1500.0 NaN NaN
185 United States of America Americas 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1
186 Uruguay Americas 3395 22.05 18.59 2.07 77 7.2 140.75 98.1 14640.0 NaN NaN
187 Uzbekistan Europe 28541 28.90 6.38 2.38 68 39.6 91.65 99.4 3420.0 93.3 91.0
188 Vanuatu Western Pacific 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN
189 Venezuela (Bolivarian Republic of) Americas 29955 28.84 9.17 2.44 75 15.3 97.78 NaN 12430.0 94.7 95.1
190 Viet Nam Western Pacific 90796 22.87 9.32 1.79 75 23.0 143.39 93.2 3250.0 NaN NaN
191 Yemen Eastern Mediterranean 23852 40.72 4.54 4.35 64 60.0 47.05 63.9 2170.0 85.5 70.5
192 Zambia Africa 14075 46.73 3.95 5.77 55 88.5 60.59 71.2 1490.0 91.4 93.9
193 Zimbabwe Africa 13724 40.24 5.68 3.64 54 89.8 72.13 92.2 NaN NaN NaN

Example-2

In [850]:
pd.reset_option('all',silent=True)
import pandas as pd
df_who= pd.read_csv('WHO_csv.csv')
pd.set_option('display.max_rows',6)
df_who
Out[850]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
... ... ... ... ... ... ... ... ... ... ... ... ... ...
191 Yemen Eastern Mediterranean 23852 40.72 4.54 4.35 64 60.0 47.05 63.9 2170.0 85.5 70.5
192 Zambia Africa 14075 46.73 3.95 5.77 55 88.5 60.59 71.2 1490.0 91.4 93.9
193 Zimbabwe Africa 13724 40.24 5.68 3.64 54 89.8 72.13 92.2 NaN NaN NaN

194 rows × 13 columns

reset_option

In [851]:
pd.reset_option('display.max_rows')
df_who
#Method-2
#pd.reset_option('all',silent=True)
Out[851]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2
5 Antigua and Barbuda Americas 89 25.96 12.35 2.12 75 9.9 196.41 99.0 17900.0 91.1 84.5
6 Argentina Americas 41087 24.42 14.97 2.20 76 14.2 134.92 97.8 17130.0 NaN NaN
7 Armenia Europe 2969 20.34 14.06 1.74 71 16.4 103.57 99.6 6100.0 NaN NaN
8 Australia Western Pacific 23050 18.95 19.46 1.89 82 4.9 108.34 NaN 38110.0 96.9 97.5
9 Austria Europe 8464 14.51 23.52 1.44 81 4.0 154.78 NaN 42050.0 NaN NaN
10 Azerbaijan Europe 9309 22.25 8.24 1.96 71 35.2 108.75 NaN 8960.0 85.3 84.1
11 Bahamas Americas 372 21.62 11.24 1.90 75 16.9 86.06 NaN NaN NaN NaN
12 Bahrain Eastern Mediterranean 1318 20.16 3.38 2.12 79 9.6 127.96 91.9 NaN NaN NaN
13 Bangladesh South-East Asia 155000 30.57 6.89 2.24 70 40.9 56.06 56.8 1940.0 NaN NaN
14 Barbados Americas 283 18.99 15.78 1.84 78 18.4 127.01 NaN NaN NaN NaN
15 Belarus Europe 9405 15.10 19.31 1.47 71 5.2 111.88 NaN 14460.0 NaN NaN
16 Belgium Europe 11060 16.88 23.81 1.85 80 4.2 116.61 NaN 39190.0 98.9 99.2
17 Belize Americas 324 34.40 5.74 2.76 74 18.3 69.96 NaN 6090.0 NaN NaN
18 Benin Africa 10051 42.95 4.54 5.01 57 89.5 85.33 42.4 1620.0 NaN NaN
19 Bhutan South-East Asia 742 28.53 6.90 2.32 67 44.6 65.58 NaN 5570.0 88.3 91.5
20 Bolivia (Plurinational State of) Americas 10496 35.23 7.28 3.31 67 41.4 82.82 NaN 4890.0 91.2 91.5
21 Bosnia and Herzegovina Europe 3834 16.35 20.52 1.26 76 6.7 84.52 97.9 9190.0 86.5 88.4
22 Botswana Africa 2004 33.75 5.63 2.71 66 53.3 142.82 84.5 14550.0 NaN NaN
23 Brazil Americas 199000 24.56 10.81 1.82 74 14.4 124.26 NaN 11420.0 NaN NaN
24 Brunei Darussalam Western Pacific 412 25.75 7.03 2.03 77 8.0 109.17 95.2 NaN NaN NaN
25 Bulgaria Europe 7278 13.53 26.11 1.51 74 12.1 140.68 NaN 14160.0 99.3 99.7
26 Burkina Faso Africa 16460 45.66 3.88 5.78 56 102.4 45.27 NaN 1300.0 60.7 55.9
27 Burundi Africa 9850 44.20 3.87 6.21 53 104.3 22.33 67.2 610.0 NaN NaN
28 Cambodia Western Pacific 14865 31.23 7.67 2.93 65 39.7 96.17 NaN 2230.0 96.4 95.4
29 Cameroon Africa 21700 43.08 4.89 4.94 53 94.9 52.35 NaN 2330.0 99.6 87.4
... ... ... ... ... ... ... ... ... ... ... ... ... ...
164 Suriname Americas 535 27.83 9.55 2.32 72 20.8 178.88 94.7 NaN NaN NaN
165 Swaziland Africa 1231 38.05 5.34 3.48 50 79.7 63.70 87.4 5930.0 NaN NaN
166 Sweden Europe 9511 16.71 25.32 1.93 82 2.9 118.57 NaN 42200.0 99.7 99.0
167 Switzerland Europe 7997 14.79 23.25 1.51 83 4.3 131.43 NaN 52570.0 98.9 99.5
168 Syrian Arab Republic Eastern Mediterranean 21890 35.35 6.09 3.04 75 15.1 63.17 83.4 NaN NaN NaN
169 Tajikistan Europe 8009 35.75 4.80 3.81 68 58.3 90.64 99.7 2300.0 99.5 96.0
170 Thailand South-East Asia 66785 18.47 13.96 1.43 74 13.2 111.63 NaN 8360.0 NaN NaN
171 The former Yugoslav Republic of Macedonia Europe 2106 16.89 17.56 1.44 75 7.4 107.24 97.3 11090.0 97.3 99.2
172 Timor-Leste South-East Asia 1114 46.33 5.16 6.11 64 56.7 53.23 58.3 NaN 86.2 85.6
173 Togo Africa 6643 41.89 4.44 4.75 56 95.5 50.45 NaN 1040.0 NaN NaN
174 Tonga Western Pacific 105 37.33 7.96 3.86 72 12.8 52.63 NaN 5000.0 NaN NaN
175 Trinidad and Tobago Americas 1337 20.73 13.18 1.80 71 20.7 135.64 98.8 NaN 97.7 97.0
176 Tunisia Eastern Mediterranean 10875 23.22 10.49 2.04 76 16.1 116.93 NaN 9030.0 NaN NaN
177 Turkey Europe 73997 26.00 10.56 2.08 76 14.2 88.70 NaN 16940.0 99.5 98.3
178 Turkmenistan Europe 5173 28.65 6.30 2.38 63 52.8 68.77 99.6 8690.0 NaN NaN
179 Tuvalu Western Pacific 10 30.61 9.07 NaN 64 29.7 21.63 NaN NaN NaN NaN
180 Uganda Africa 36346 48.54 3.72 6.06 56 68.9 48.38 73.2 1310.0 89.7 92.3
181 Ukraine Europe 45530 14.18 20.76 1.45 71 10.7 122.98 99.7 7040.0 90.8 91.5
182 United Arab Emirates Eastern Mediterranean 9206 14.41 0.81 1.84 76 8.4 148.62 NaN 47890.0 NaN NaN
183 United Kingdom Europe 62783 17.54 23.06 1.90 80 4.8 130.75 NaN 36010.0 99.8 99.6
184 United Republic of Tanzania Africa 47783 44.85 4.89 5.36 59 54.0 55.53 73.2 1500.0 NaN NaN
185 United States of America Americas 318000 19.63 19.31 2.00 79 7.1 92.72 NaN 48820.0 95.4 96.1
186 Uruguay Americas 3395 22.05 18.59 2.07 77 7.2 140.75 98.1 14640.0 NaN NaN
187 Uzbekistan Europe 28541 28.90 6.38 2.38 68 39.6 91.65 99.4 3420.0 93.3 91.0
188 Vanuatu Western Pacific 247 37.37 6.02 3.46 72 17.9 55.76 82.6 4330.0 NaN NaN
189 Venezuela (Bolivarian Republic of) Americas 29955 28.84 9.17 2.44 75 15.3 97.78 NaN 12430.0 94.7 95.1
190 Viet Nam Western Pacific 90796 22.87 9.32 1.79 75 23.0 143.39 93.2 3250.0 NaN NaN
191 Yemen Eastern Mediterranean 23852 40.72 4.54 4.35 64 60.0 47.05 63.9 2170.0 85.5 70.5
192 Zambia Africa 14075 46.73 3.95 5.77 55 88.5 60.59 71.2 1490.0 91.4 93.9
193 Zimbabwe Africa 13724 40.24 5.68 3.64 54 89.8 72.13 92.2 NaN NaN NaN

194 rows × 13 columns

colwidth

In [852]:
pd.reset_option('all',silent=True)
df_who= pd.read_csv('WHO_csv.csv')
pd.set_option('display.max_colwidth',10)
df_who.head()
Out[852]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghan... Easter... 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230.0 93.1 78.2

Display Precision

In [853]:
pd.reset_option('all',silent=True)
df_who= pd.read_csv('WHO_csv.csv')
pd.set_option('display.precision',1)
df_who.head()
Out[853]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.4 3.8 5.4 60 98.5 54.3 NaN 1140.0 NaN NaN
1 Albania Europe 3162 21.3 14.9 1.8 74 16.7 96.4 NaN 8820.0 NaN NaN
2 Algeria Africa 38482 27.4 7.2 2.8 73 20.0 99.0 NaN 8310.0 98.2 96.4
3 Andorra Europe 78 15.2 22.9 NaN 82 3.2 75.5 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.6 3.8 6.1 51 163.5 48.4 70.1 5230.0 93.1 78.2

Display , in Float

In [854]:
pd.reset_option('all',silent=True)
df_who= pd.read_csv('WHO_csv.csv')
pd.set_option('display.float_format','{:,}'.format)
df_who.head()
Out[854]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.4 60 98.5 54.26 nan 1,140.0 nan nan
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 nan 8,820.0 nan nan
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 nan 8,310.0 98.2 96.4
3 Andorra Europe 78 15.2 22.86 nan 82 3.2 75.49 nan nan 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.1 51 163.5 48.38 70.1 5,230.0 93.1 78.2

Style a DataFrame

  • The previous trick is useful if you want to change the display of your entire notebook. However, a more flexible and powerful approach is to define the style of a particular DataFrame.
  • Let's return to the stocks DataFrame:
In [855]:
df= pd.read_csv('Sale.txt',sep=",",parse_dates=['Date'])
df
Out[855]:
Date Close Volume Symbol
0 2016-10-03 31.5 14070500 CSCO
1 2016-10-03 112.52 21701800 AAPL
2 2016-10-03 57.42 19189500 MSFT
3 2016-10-04 113.0 29736800 AAPL
4 2016-10-04 57.24 20085900 MSFT
5 2016-10-04 31.35 18460400 CSCO
6 2016-10-05 57.64 16726400 MSFT
7 2016-10-05 31.59 11808600 CSCO
8 2016-10-05 113.05 21453100 AAPL
  • We can create a dictionary of format strings that specifies how each column should be formatted
  • And then we can pass it to the DataFrame's style.format() method:
In [856]:
df.style.format({'Symbol': str.lower,'Volume':'{:,}','Date':'{:%m/%d/%y}','Close':'${:.2f}'})
Out[856]:
Date Close Volume Symbol
0 10/03/16 $31.50 14,070,500 csco
1 10/03/16 $112.52 21,701,800 aapl
2 10/03/16 $57.42 19,189,500 msft
3 10/04/16 $113.00 29,736,800 aapl
4 10/04/16 $57.24 20,085,900 msft
5 10/04/16 $31.35 18,460,400 csco
6 10/05/16 $57.64 16,726,400 msft
7 10/05/16 $31.59 11,808,600 csco
8 10/05/16 $113.05 21,453,100 aapl
  • Notice that the Date is now in month-day-year format, the closing price has a dollar sign, and the Volume has commas.
  • We can apply more styling by chaining additional methods
In [857]:
(df.style.format({'Symbol': str.lower,'Volume':'{:,}','Date':'{:%m/%d/%y}','Close':'${:.2f}'})
 .hide_index()
 .highlight_min(['Close','Volume'], color='red')
 .highlight_max('Close', color='lightgreen')
)
Out[857]:
Date Close Volume Symbol
10/03/16 $31.50 14,070,500 csco
10/03/16 $112.52 21,701,800 aapl
10/03/16 $57.42 19,189,500 msft
10/04/16 $113.00 29,736,800 aapl
10/04/16 $57.24 20,085,900 msft
10/04/16 $31.35 18,460,400 csco
10/05/16 $57.64 16,726,400 msft
10/05/16 $31.59 11,808,600 csco
10/05/16 $113.05 21,453,100 aapl
  • We've now hidden the index, highlighted the minimum Close value in red, and highlighted the maximum Close value in green.
  • Here's another example of DataFrame styling:
In [858]:
(df.style.format({'Symbol': str.lower,'Volume':'{:,}','Date':'{:%m/%d/%y}','Close':'${:.2f}'})
 .hide_index()
 .background_gradient(subset='Volume', cmap='Blues')
)
Out[858]:
Date Close Volume Symbol
10/03/16 $31.50 14,070,500 csco
10/03/16 $112.52 21,701,800 aapl
10/03/16 $57.42 19,189,500 msft
10/04/16 $113.00 29,736,800 aapl
10/04/16 $57.24 20,085,900 msft
10/04/16 $31.35 18,460,400 csco
10/05/16 $57.64 16,726,400 msft
10/05/16 $31.59 11,808,600 csco
10/05/16 $113.05 21,453,100 aapl
  • The Volume column now has a background gradient to help you easily identify high and low values.
  • And here's one final example
In [859]:
(df.style.format({'Symbol': str.lower,'Volume':'{:,}','Date':'{:%m/%d/%y}','Close':'${:.2f}'})
 .hide_index()
 .bar('Volume', color='lightblue', align='zero')
 .set_caption('Stock Prices from October 2016')
)
Out[859]:
Stock Prices from October 2016
Date Close Volume Symbol
10/03/16 $31.50 14,070,500 csco
10/03/16 $112.52 21,701,800 aapl
10/03/16 $57.42 19,189,500 msft
10/04/16 $113.00 29,736,800 aapl
10/04/16 $57.24 20,085,900 msft
10/04/16 $31.35 18,460,400 csco
10/05/16 $57.64 16,726,400 msft
10/05/16 $31.59 11,808,600 csco
10/05/16 $113.05 21,453,100 aapl
  • There's now a bar chart within the Volume column and a caption above the DataFrame.
  • Note that there are many more options for how you can style your DataFrame
  • More Information

11.42 Time Series Analysis

  • Timeseries is a set of data points indexed in time order.
In [860]:
from IPython.display import YouTubeVideo
YouTubeVideo('yCgJGsg0Xa4',width=900, height=500)
Out[860]:
In [861]:
from IPython.display import YouTubeVideo
YouTubeVideo('r0s4slGHwzE',width=900, height=500)
Out[861]:
In [862]:
df= pd.read_csv('AAPL.csv')
In [863]:
df.head()
Out[863]:
Date Open High Low Close Adj Close Volume
0 1/2/2018 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
1 1/3/2018 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
2 1/4/2018 172.53999299999998 173.470001 172.080002 173.029999 170.47070300000001 22434600
3 1/5/2018 173.440002 175.369995 173.050003 175.0 172.41156 23660000
4 1/8/2018 174.350006 175.610001 173.929993 174.350006 171.77117900000002 20567800
In [864]:
type(df.Date[0])
Out[864]:
str

Solution

In [865]:
df= pd.read_csv('AAPL.csv',parse_dates=['Date'])
In [866]:
df.head()
Out[866]:
Date Open High Low Close Adj Close Volume
0 2018-01-02 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
1 2018-01-03 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
2 2018-01-04 172.53999299999998 173.470001 172.080002 173.029999 170.47070300000001 22434600
3 2018-01-05 173.440002 175.369995 173.050003 175.0 172.41156 23660000
4 2018-01-08 174.350006 175.610001 173.929993 174.350006 171.77117900000002 20567800
In [867]:
type(df.Date[0])
Out[867]:
pandas._libs.tslibs.timestamps.Timestamp

Select Date Col to Index

In [868]:
df= pd.read_csv('AAPL.csv',parse_dates=['Date'],index_col='Date')
df.head()
Out[868]:
Open High Low Close Adj Close Volume
Date
2018-01-02 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-03 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
2018-01-04 172.53999299999998 173.470001 172.080002 173.029999 170.47070300000001 22434600
2018-01-05 173.440002 175.369995 173.050003 175.0 172.41156 23660000
2018-01-08 174.350006 175.610001 173.929993 174.350006 171.77117900000002 20567800
In [869]:
df.index
Out[869]:
DatetimeIndex(['2018-01-02', '2018-01-03', '2018-01-04', '2018-01-05',
               '2018-01-08', '2018-01-09', '2018-01-10', '2018-01-11',
               '2018-01-12', '2018-01-15',
               ...
               '2018-11-23', '2018-11-26', '2018-11-27', '2018-11-28',
               '2018-11-29', '2018-11-30', '2018-12-03', '2018-12-04',
               '2018-12-06', '2018-12-07'],
              dtype='datetime64[ns]', name='Date', length=237, freq=None)

partial index

Example-1

In [870]:
df['2018-01']
Out[870]:
Open High Low Close Adj Close Volume
Date
2018-01-02 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-03 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
2018-01-04 172.53999299999998 173.470001 172.080002 173.029999 170.47070300000001 22434600
2018-01-05 173.440002 175.369995 173.050003 175.0 172.41156 23660000
2018-01-08 174.350006 175.610001 173.929993 174.350006 171.77117900000002 20567800
2018-01-09 174.550003 175.059998 173.41000400000001 174.330002 171.751465 21584000
2018-01-10 173.16000400000001 174.300003 173.0 174.28999299999998 171.712051 23959900
2018-01-11 174.58999599999999 175.490005 174.490005 175.279999 172.687408 18667700
2018-01-12 176.179993 177.360001 175.649994 177.08999599999999 174.470642 25226000
2018-01-15 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
2018-01-16 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
2018-01-17 176.149994 179.25 175.070007 179.100006 176.450928 34386800
2018-01-18 179.369995 180.100006 178.25 179.259995 176.608551 31193400
2018-01-19 178.610001 179.580002 177.41000400000001 178.46000700000002 175.82038899999998 32425100
2018-01-22 177.300003 177.779999 176.600006 177.0 174.381973 27108600
2018-01-23 177.300003 179.440002 176.820007 177.03999299999998 174.421387 32689100
2018-01-24 177.25 177.300003 173.199997 174.220001 171.643082 51105100
2018-01-25 174.509995 174.949997 170.529999 171.110001 168.57908600000002 41529000
2018-01-26 172.0 172.0 170.059998 171.509995 168.973175 39143000
2018-01-29 170.16000400000001 170.16000400000001 167.070007 167.96000700000002 165.475677 50640400
2018-01-30 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-31 166.869995 168.440002 166.5 167.429993 164.953522 32478900

Example-2

  • Find Mean Price of Apple's Stock in Jan,2018
In [871]:
df['2018-01'].Close.mean()
Out[871]:
174.10454495454547

Example-3

In [872]:
df.loc['2018-01-29']
Out[872]:
Open        170.16000400000001
High        170.16000400000001
Low                 167.070007
Close       167.96000700000002
Adj Close           165.475677
Volume            50,640,400.0
Name: 2018-01-29 00:00:00, dtype: float64

Example-4

In [873]:
df['2018-01-15':'2018-01-20']
Out[873]:
Open High Low Close Adj Close Volume
Date
2018-01-15 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
2018-01-16 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
2018-01-17 176.149994 179.25 175.070007 179.100006 176.450928 34386800
2018-01-18 179.369995 180.100006 178.25 179.259995 176.608551 31193400
2018-01-19 178.610001 179.580002 177.41000400000001 178.46000700000002 175.82038899999998 32425100

Example-5

In [874]:
df['2018-01-15':'2018-01-20'].mean()
Out[874]:
Open               177.9859956
High        179.54200120000002
Low                176.6020018
Close              177.8400024
Adj Close          175.2095612
Volume            31,427,420.0
dtype: float64

Example-6

In [875]:
df['2018-01-15':'2018-01-20'].Close.mean()
Out[875]:
177.8400024

Resampling

  • Convenience method for frequency conversion and resampling of time series.
  • More Information

Example-1

In [876]:
df.resample('M').mean()
Out[876]:
Open High Low Close Adj Close Volume
Date
2018-01-31 174.19772622727268 175.39363718181815 172.96909195454543 174.10454495454547 171.52934677272728 31,320,600.0
2018-02-28 167.2763157368421 169.88789278947365 165.47947457894733 167.6389472631579 165.62927642105265 48,836,542.10526316
2018-03-31 175.04714304761904 176.7995240952381 173.06143033333328 174.49619171428571 172.61611876190477 33,952,900.0
2018-04-30 169.76523847619052 171.65761814285713 168.2923816666667 169.8342865238095 168.00444314285713 31,717,042.85714286
2018-05-31 184.9563653636364 186.4690905909091 183.86227281818176 185.5368180909091 183.99324659090908 28,226,195.454545453
2018-06-30 188.720477 189.83190557142856 187.41904704761902 188.62142876190478 187.30866999999995 25,124,976.19047619
2018-07-31 190.1961902857143 191.5399999047619 189.0147603809524 190.31142904761901 188.98690871428573 18,747,209.523809522
2018-08-31 212.48695713043483 214.56130386956525 211.30565221739135 213.34608852173915 212.38483847826095 30,448,647.826086957
2018-09-30 222.33105299999997 224.53052636842102 220.07789526315787 222.0736854210526 221.30152652631577 35,735,368.421052635
2018-10-31 221.20000100000004 223.94087082608695 217.87434713043476 220.84565204347825 220.07776343478267 34,336,891.30434783
2018-11-30 191.81952338095238 193.9495231904762 188.70047514285713 191.23571409523805 191.06268742857142 45,765,071.428571425
2018-12-31 177.665001 179.15000125 174.050003 176.18000375 176.18000375 41,735,225.0

Example-2

In [877]:
df['Close'].resample('M').mean()
Out[877]:
Date
2018-01-31   174.10454495454547
2018-02-28    167.6389472631579
2018-03-31   174.49619171428571
2018-04-30    169.8342865238095
2018-05-31    185.5368180909091
2018-06-30   188.62142876190478
2018-07-31   190.31142904761901
2018-08-31   213.34608852173915
2018-09-30    222.0736854210526
2018-10-31   220.84565204347825
2018-11-30   191.23571409523805
2018-12-31         176.18000375
Freq: M, Name: Close, dtype: float64

Date Range

In [878]:
from IPython.display import YouTubeVideo
YouTubeVideo('A9c7hGXQ5A8',width=900, height=500)
Out[878]:

Example-1

In [879]:
df= pd.read_csv('AAPLwd.csv')
df.head()
Out[879]:
Open High Low Close Adj Close Volume
0 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
1 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
2 172.53999299999998 173.470001 172.080002 173.029999 170.47070300000001 22434600
3 173.440002 175.369995 173.050003 175.0 172.41156 23660000
4 174.350006 175.610001 173.929993 174.350006 171.77117900000002 20567800
  • Date column is missing in the data set
In [880]:
df1= pd.date_range(start="1/2/2018", end="1/31/2018",freq='B')
df1
Out[880]:
DatetimeIndex(['2018-01-02', '2018-01-03', '2018-01-04', '2018-01-05',
               '2018-01-08', '2018-01-09', '2018-01-10', '2018-01-11',
               '2018-01-12', '2018-01-15', '2018-01-16', '2018-01-17',
               '2018-01-18', '2018-01-19', '2018-01-22', '2018-01-23',
               '2018-01-24', '2018-01-25', '2018-01-26', '2018-01-29',
               '2018-01-30', '2018-01-31'],
              dtype='datetime64[ns]', freq='B')
In [881]:
df.set_index(df1,inplace=True)
df
Out[881]:
Open High Low Close Adj Close Volume
2018-01-02 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-03 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
2018-01-04 172.53999299999998 173.470001 172.080002 173.029999 170.47070300000001 22434600
2018-01-05 173.440002 175.369995 173.050003 175.0 172.41156 23660000
2018-01-08 174.350006 175.610001 173.929993 174.350006 171.77117900000002 20567800
2018-01-09 174.550003 175.059998 173.41000400000001 174.330002 171.751465 21584000
2018-01-10 173.16000400000001 174.300003 173.0 174.28999299999998 171.712051 23959900
2018-01-11 174.58999599999999 175.490005 174.490005 175.279999 172.687408 18667700
2018-01-12 176.179993 177.360001 175.649994 177.08999599999999 174.470642 25226000
2018-01-15 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
2018-01-16 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
2018-01-17 176.149994 179.25 175.070007 179.100006 176.450928 34386800
2018-01-18 179.369995 180.100006 178.25 179.259995 176.608551 31193400
2018-01-19 178.610001 179.580002 177.41000400000001 178.46000700000002 175.82038899999998 32425100
2018-01-22 177.300003 177.779999 176.600006 177.0 174.381973 27108600
2018-01-23 177.300003 179.440002 176.820007 177.03999299999998 174.421387 32689100
2018-01-24 177.25 177.300003 173.199997 174.220001 171.643082 51105100
2018-01-25 174.509995 174.949997 170.529999 171.110001 168.57908600000002 41529000
2018-01-26 172.0 172.0 170.059998 171.509995 168.973175 39143000
2018-01-29 170.16000400000001 170.16000400000001 167.070007 167.96000700000002 165.475677 50640400
2018-01-30 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-31 166.869995 168.440002 166.5 167.429993 164.953522 32478900

Example-2

  • Include wk end in the data set
  • method='pad' : Fwd fill method (get the values for 'Sat' and 'Sun' from 'Fri')
In [882]:
df.asfreq('D',method='pad')
Out[882]:
Open High Low Close Adj Close Volume
2018-01-02 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-03 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
2018-01-04 172.53999299999998 173.470001 172.080002 173.029999 170.47070300000001 22434600
2018-01-05 173.440002 175.369995 173.050003 175.0 172.41156 23660000
2018-01-06 173.440002 175.369995 173.050003 175.0 172.41156 23660000
2018-01-07 173.440002 175.369995 173.050003 175.0 172.41156 23660000
2018-01-08 174.350006 175.610001 173.929993 174.350006 171.77117900000002 20567800
2018-01-09 174.550003 175.059998 173.41000400000001 174.330002 171.751465 21584000
2018-01-10 173.16000400000001 174.300003 173.0 174.28999299999998 171.712051 23959900
2018-01-11 174.58999599999999 175.490005 174.490005 175.279999 172.687408 18667700
2018-01-12 176.179993 177.360001 175.649994 177.08999599999999 174.470642 25226000
2018-01-13 176.179993 177.360001 175.649994 177.08999599999999 174.470642 25226000
2018-01-14 176.179993 177.360001 175.649994 177.08999599999999 174.470642 25226000
2018-01-15 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
2018-01-16 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
2018-01-17 176.149994 179.25 175.070007 179.100006 176.450928 34386800
2018-01-18 179.369995 180.100006 178.25 179.259995 176.608551 31193400
2018-01-19 178.610001 179.580002 177.41000400000001 178.46000700000002 175.82038899999998 32425100
2018-01-20 178.610001 179.580002 177.41000400000001 178.46000700000002 175.82038899999998 32425100
2018-01-21 178.610001 179.580002 177.41000400000001 178.46000700000002 175.82038899999998 32425100
2018-01-22 177.300003 177.779999 176.600006 177.0 174.381973 27108600
2018-01-23 177.300003 179.440002 176.820007 177.03999299999998 174.421387 32689100
2018-01-24 177.25 177.300003 173.199997 174.220001 171.643082 51105100
2018-01-25 174.509995 174.949997 170.529999 171.110001 168.57908600000002 41529000
2018-01-26 172.0 172.0 170.059998 171.509995 168.973175 39143000
2018-01-27 172.0 172.0 170.059998 171.509995 168.973175 39143000
2018-01-28 172.0 172.0 170.059998 171.509995 168.973175 39143000
2018-01-29 170.16000400000001 170.16000400000001 167.070007 167.96000700000002 165.475677 50640400
2018-01-30 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-31 166.869995 168.440002 166.5 167.429993 164.953522 32478900

Example-3

In [883]:
df.asfreq('W',method='pad')
Out[883]:
Open High Low Close Adj Close Volume
2018-01-07 173.440002 175.369995 173.050003 175.0 172.41156 23660000
2018-01-14 176.179993 177.360001 175.649994 177.08999599999999 174.470642 25226000
2018-01-21 178.610001 179.580002 177.41000400000001 178.46000700000002 175.82038899999998 32425100
2018-01-28 172.0 172.0 170.059998 171.509995 168.973175 39143000

Example-4

In [884]:
df.asfreq('H',method='pad')
Out[884]:
Open High Low Close Adj Close Volume
2018-01-02 00:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 01:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 02:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 03:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 04:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 05:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 06:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 07:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 08:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 09:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 10:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 11:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 12:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 13:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 14:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 15:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 16:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 17:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 18:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 19:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 20:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 21:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 22:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 23:00:00 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-03 00:00:00 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
2018-01-03 01:00:00 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
2018-01-03 02:00:00 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
2018-01-03 03:00:00 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
2018-01-03 04:00:00 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
2018-01-03 05:00:00 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
... ... ... ... ... ... ...
2018-01-29 19:00:00 170.16000400000001 170.16000400000001 167.070007 167.96000700000002 165.475677 50640400
2018-01-29 20:00:00 170.16000400000001 170.16000400000001 167.070007 167.96000700000002 165.475677 50640400
2018-01-29 21:00:00 170.16000400000001 170.16000400000001 167.070007 167.96000700000002 165.475677 50640400
2018-01-29 22:00:00 170.16000400000001 170.16000400000001 167.070007 167.96000700000002 165.475677 50640400
2018-01-29 23:00:00 170.16000400000001 170.16000400000001 167.070007 167.96000700000002 165.475677 50640400
2018-01-30 00:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 01:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 02:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 03:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 04:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 05:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 06:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 07:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 08:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 09:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 10:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 11:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 12:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 13:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 14:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 15:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 16:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 17:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 18:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 19:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 20:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 21:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 22:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-30 23:00:00 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-31 00:00:00 166.869995 168.440002 166.5 167.429993 164.953522 32478900

697 rows × 6 columns

Example-5

In [885]:
df= pd.date_range(start='1/1/2017',periods=72, freq='B')
df
Out[885]:
DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05',
               '2017-01-06', '2017-01-09', '2017-01-10', '2017-01-11',
               '2017-01-12', '2017-01-13', '2017-01-16', '2017-01-17',
               '2017-01-18', '2017-01-19', '2017-01-20', '2017-01-23',
               '2017-01-24', '2017-01-25', '2017-01-26', '2017-01-27',
               '2017-01-30', '2017-01-31', '2017-02-01', '2017-02-02',
               '2017-02-03', '2017-02-06', '2017-02-07', '2017-02-08',
               '2017-02-09', '2017-02-10', '2017-02-13', '2017-02-14',
               '2017-02-15', '2017-02-16', '2017-02-17', '2017-02-20',
               '2017-02-21', '2017-02-22', '2017-02-23', '2017-02-24',
               '2017-02-27', '2017-02-28', '2017-03-01', '2017-03-02',
               '2017-03-03', '2017-03-06', '2017-03-07', '2017-03-08',
               '2017-03-09', '2017-03-10', '2017-03-13', '2017-03-14',
               '2017-03-15', '2017-03-16', '2017-03-17', '2017-03-20',
               '2017-03-21', '2017-03-22', '2017-03-23', '2017-03-24',
               '2017-03-27', '2017-03-28', '2017-03-29', '2017-03-30',
               '2017-03-31', '2017-04-03', '2017-04-04', '2017-04-05',
               '2017-04-06', '2017-04-07', '2017-04-10', '2017-04-11'],
              dtype='datetime64[ns]', freq='B')
In [886]:
ts=pd.Series(np.random.randint(1,10,len(df)),index=df)
ts
Out[886]:
2017-01-02    1
2017-01-03    8
2017-01-04    4
2017-01-05    6
2017-01-06    1
             ..
2017-04-05    1
2017-04-06    2
2017-04-07    4
2017-04-10    2
2017-04-11    3
Freq: B, Length: 72, dtype: int32

bdate_range

  • Return a fixed frequency DatetimeIndex, with business day as the default frequency
  • More Information
In [887]:
bd=pd.bdate_range(start='9/1/2019', end='9/16/2019')
bd
Out[887]:
DatetimeIndex(['2019-09-02', '2019-09-03', '2019-09-04', '2019-09-05',
               '2019-09-06', '2019-09-09', '2019-09-10', '2019-09-11',
               '2019-09-12', '2019-09-13', '2019-09-16'],
              dtype='datetime64[ns]', freq='B')

timedelta_range

  • Return a fixed frequency TimedeltaIndex, with day as the default frequency
  • More Information
In [888]:
tr=pd.timedelta_range(start='1 day', periods=4)
tr
Out[888]:
TimedeltaIndex(['1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq='D')
In [889]:
tr=pd.timedelta_range(start='1 day', periods=4,closed='right')
tr
Out[889]:
TimedeltaIndex(['2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq='D')
In [890]:
tr=pd.timedelta_range(start='1 day', periods=4,closed='left')
tr
Out[890]:
TimedeltaIndex(['1 days', '2 days', '3 days'], dtype='timedelta64[ns]', freq='D')
In [891]:
tr=pd.timedelta_range(start='1 day', end='2 days', freq='6H')
tr
Out[891]:
TimedeltaIndex(['1 days 00:00:00', '1 days 06:00:00', '1 days 12:00:00',
                '1 days 18:00:00', '2 days 00:00:00'],
               dtype='timedelta64[ns]', freq='6H')

interval_range

In [892]:
ir=pd.interval_range(start=0, end=5)
ir
Out[892]:
IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4], (4, 5]],
              closed='right',
              dtype='interval[int64]')
In [893]:
ir= pd.interval_range(start=pd.Timestamp('2017-01-01'),
                   end=pd.Timestamp('2017-01-04'))
ir
Out[893]:
IntervalIndex([(2017-01-01, 2017-01-02], (2017-01-02, 2017-01-03], (2017-01-03, 2017-01-04]],
              closed='right',
              dtype='interval[datetime64[ns]]')
In [894]:
ir=pd.interval_range(start=0, periods=4, freq=1.5)
ir
Out[894]:
IntervalIndex([(0.0, 1.5], (1.5, 3.0], (3.0, 4.5], (4.5, 6.0]],
              closed='right',
              dtype='interval[float64]')
In [895]:
ir= pd.interval_range(start=pd.Timestamp('2019-01-01'),
                   periods=3, freq='MS')
ir
Out[895]:
IntervalIndex([(2019-01-01, 2019-02-01], (2019-02-01, 2019-03-01], (2019-03-01, 2019-04-01]],
              closed='right',
              dtype='interval[datetime64[ns]]')
In [896]:
ir=pd.interval_range(start=0, end=6, periods=4)
ir
Out[896]:
IntervalIndex([(0.0, 1.5], (1.5, 3.0], (3.0, 4.5], (4.5, 6.0]],
              closed='right',
              dtype='interval[float64]')
In [897]:
ir=pd.interval_range(end=5, periods=4, closed='both')
ir
Out[897]:
IntervalIndex([[1, 2], [2, 3], [3, 4], [4, 5]],
              closed='both',
              dtype='interval[int64]')

CustomBusinessDay

In [898]:
from IPython.display import YouTubeVideo
YouTubeVideo('Fo0IMzfcnQE',width=900, height=500)
Out[898]:

US Holidays

In [899]:
df= pd.read_csv('AAPLus.csv')
df.head()
Out[899]:
Open High Low Close Adj Close Volume
0 183.820007 187.300003 183.419998 187.179993 185.877258 17731300
1 187.78999299999998 187.949997 183.53999299999998 183.919998 182.63995400000002 13954800
2 185.259995 186.41000400000001 184.279999 185.399994 184.10965 16604200
3 185.419998 188.429993 185.199997 187.970001 186.661789 17485200
4 189.5 190.679993 189.300003 190.580002 189.253616 19756600
In [900]:
df1= pd.date_range(start="7/1/2018", end="7/31/2018",freq='B')
df1
Out[900]:
DatetimeIndex(['2018-07-02', '2018-07-03', '2018-07-04', '2018-07-05',
               '2018-07-06', '2018-07-09', '2018-07-10', '2018-07-11',
               '2018-07-12', '2018-07-13', '2018-07-16', '2018-07-17',
               '2018-07-18', '2018-07-19', '2018-07-20', '2018-07-23',
               '2018-07-24', '2018-07-25', '2018-07-26', '2018-07-27',
               '2018-07-30', '2018-07-31'],
              dtype='datetime64[ns]', freq='B')
  • 2018-07-04 US National Holiday
In [901]:
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
usb=CustomBusinessDay(calendar=USFederalHolidayCalendar())
usb
Out[901]:
<CustomBusinessDay>
In [902]:
df1= pd.date_range(start="7/1/2018", end="7/31/2018",freq=usb)
df1
Out[902]:
DatetimeIndex(['2018-07-02', '2018-07-03', '2018-07-05', '2018-07-06',
               '2018-07-09', '2018-07-10', '2018-07-11', '2018-07-12',
               '2018-07-13', '2018-07-16', '2018-07-17', '2018-07-18',
               '2018-07-19', '2018-07-20', '2018-07-23', '2018-07-24',
               '2018-07-25', '2018-07-26', '2018-07-27', '2018-07-30',
               '2018-07-31'],
              dtype='datetime64[ns]', freq='C')
In [903]:
df.set_index(df1,inplace=True)
df
Out[903]:
Open High Low Close Adj Close Volume
2018-07-02 183.820007 187.300003 183.419998 187.179993 185.877258 17731300
2018-07-03 187.78999299999998 187.949997 183.53999299999998 183.919998 182.63995400000002 13954800
2018-07-05 185.259995 186.41000400000001 184.279999 185.399994 184.10965 16604200
2018-07-06 185.419998 188.429993 185.199997 187.970001 186.661789 17485200
2018-07-09 189.5 190.679993 189.300003 190.580002 189.253616 19756600
2018-07-10 190.71000700000002 191.279999 190.179993 190.350006 189.02520800000002 15939100
2018-07-11 188.5 189.779999 187.610001 187.880005 186.572403 18831500
2018-07-12 189.529999 191.41000400000001 189.309998 191.029999 189.70048500000001 18041100
2018-07-13 191.080002 191.83999599999999 190.899994 191.330002 189.998398 12513900
2018-07-16 191.520004 192.649994 190.419998 190.91000400000001 189.581314 15043100
2018-07-17 189.75 191.869995 189.199997 191.449997 190.117554 15534500
2018-07-18 191.779999 191.800003 189.929993 190.399994 189.07486 16393400
2018-07-19 189.690002 192.550003 189.690002 191.880005 190.544571 20286800
2018-07-20 191.779999 192.429993 190.169998 191.440002 190.10763500000002 20676200
2018-07-23 190.679993 191.96000700000002 189.559998 191.610001 190.276443 15989400
2018-07-24 192.449997 193.66000400000001 192.050003 193.0 191.656769 18697900
2018-07-25 193.059998 194.850006 192.429993 194.820007 193.464111 16709900
2018-07-26 194.610001 195.96000700000002 193.610001 194.21000700000002 192.858353 19076000
2018-07-27 194.990005 195.190002 190.100006 190.979996 189.650818 24024000
2018-07-30 191.899994 192.199997 189.070007 189.91000400000001 188.588272 21029500
2018-07-31 190.300003 192.139999 189.33999599999999 190.28999299999998 188.965622 39373000

Pakistan Holidays

In [904]:
df= pd.read_csv('AAPLpk.csv')
df
Out[904]:
Open High Low Close Adj Close Volume
0 199.130005 201.759995 197.309998 201.5 200.09761 67935700
1 200.580002 208.380005 200.350006 207.389999 205.946625 62404000
2 207.029999 208.740005 205.479996 207.990005 206.54245 33447400
3 208.0 209.25 207.070007 209.070007 207.61492900000002 25425400
4 209.320007 209.5 206.759995 207.110001 205.668579 25587400
5 206.050003 207.809998 204.520004 207.25 205.807602 22525500
6 209.529999 209.77999900000003 207.199997 208.880005 207.426254 23492600
7 207.360001 209.100006 206.669998 207.529999 206.808411 24611200
8 209.309998 210.94999700000002 207.699997 208.869995 208.143753 25890900
9 209.220001 210.74000499999997 208.330002 210.24000499999997 209.50898700000002 28807600
10 211.75 213.809998 211.47000099999997 213.32000699999998 212.57829300000003 28500400
11 213.440002 217.94999700000002 213.16000400000001 217.580002 216.82347099999998 35427000
12 218.100006 219.17999300000002 215.11000099999998 215.460007 214.710846 30287700
13 216.80000299999998 217.190002 214.02999900000003 215.039993 214.29229700000002 26159800
14 214.100006 216.36000099999998 213.83999599999999 215.05000299999998 214.302261 19018100
15 214.649994 217.05000299999998 214.600006 215.49000499999997 214.740738 18883200
16 216.600006 216.899994 215.11000099999998 216.16000400000001 215.40840099999997 18476400
17 217.149994 218.74000499999997 216.330002 217.940002 217.18222000000003 20525100
18 219.00999500000003 220.539993 218.919998 219.69999700000002 218.936096 22776800
19 220.149994 223.49000499999997 219.41000400000001 222.979996 222.204681 27254800
20 223.25 228.25999500000003 222.399994 225.02999900000003 224.247559 48793800
21 226.50999500000003 228.86999500000002 226.0 227.63000499999998 226.83853100000002 43340100
In [905]:
df1= pd.date_range(start="8/1/2018", end="8/31/2018",freq='B')
df1
Out[905]:
DatetimeIndex(['2018-08-01', '2018-08-02', '2018-08-03', '2018-08-06',
               '2018-08-07', '2018-08-08', '2018-08-09', '2018-08-10',
               '2018-08-13', '2018-08-14', '2018-08-15', '2018-08-16',
               '2018-08-17', '2018-08-20', '2018-08-21', '2018-08-22',
               '2018-08-23', '2018-08-24', '2018-08-27', '2018-08-28',
               '2018-08-29', '2018-08-30', '2018-08-31'],
              dtype='datetime64[ns]', freq='B')
In [906]:
'''
class USFederalHolidayCalendar(AbstractHolidayCalendar):
    """
    US Federal Government Holiday Calendar based on rules specified by:
    https://www.opm.gov/policy-data-oversight/
       snow-dismissal-procedures/federal-holidays/
    """
    rules = [
        Holiday('New Years Day', month=1, day=1, observance=nearest_workday),
        USMartinLutherKingJr,
        USPresidentsDay,
        USMemorialDay,
        Holiday('July 4th', month=7, day=4, observance=nearest_workday),
        USLaborDay,
        USColumbusDay,
        Holiday('Veterans Day', month=11, day=11, observance=nearest_workday),
        USThanksgivingDay,
        Holiday('Christmas', month=12, day=25, observance=nearest_workday)
    ]
    '''
Out[906]:
'\nclass USFederalHolidayCalendar(AbstractHolidayCalendar):\n    """\n    US Federal Government Holiday Calendar based on rules specified by:\n    https://www.opm.gov/policy-data-oversight/\n       snow-dismissal-procedures/federal-holidays/\n    """\n    rules = [\n        Holiday(\'New Years Day\', month=1, day=1, observance=nearest_workday),\n        USMartinLutherKingJr,\n        USPresidentsDay,\n        USMemorialDay,\n        Holiday(\'July 4th\', month=7, day=4, observance=nearest_workday),\n        USLaborDay,\n        USColumbusDay,\n        Holiday(\'Veterans Day\', month=11, day=11, observance=nearest_workday),\n        USThanksgivingDay,\n        Holiday(\'Christmas\', month=12, day=25, observance=nearest_workday)\n    ]\n    '
In [907]:
# Pakistan Holidays Calendar Code:
# Import Required Lib:
from pandas.tseries.holiday import AbstractHolidayCalendar, nearest_workday, Holiday

class PakistanFederalHolidayCalendar(AbstractHolidayCalendar):
    """
    Pakistan Federal Government Holiday Calendar based on rules specified by:
    https://publicholidays.pk/2018-dates/
    """
    rules = [
        Holiday('Independence Day', month=8, day=14)
    ]
pakc=CustomBusinessDay(calendar=PakistanFederalHolidayCalendar())
pakc
Out[907]:
<CustomBusinessDay>
In [908]:
df1= pd.date_range(start="8/1/2018", end="8/31/2018",freq=pakc)
df1
Out[908]:
DatetimeIndex(['2018-08-01', '2018-08-02', '2018-08-03', '2018-08-06',
               '2018-08-07', '2018-08-08', '2018-08-09', '2018-08-10',
               '2018-08-13', '2018-08-15', '2018-08-16', '2018-08-17',
               '2018-08-20', '2018-08-21', '2018-08-22', '2018-08-23',
               '2018-08-24', '2018-08-27', '2018-08-28', '2018-08-29',
               '2018-08-30', '2018-08-31'],
              dtype='datetime64[ns]', freq='C')
In [909]:
df.set_index(df1,inplace=True)
df
Out[909]:
Open High Low Close Adj Close Volume
2018-08-01 199.130005 201.759995 197.309998 201.5 200.09761 67935700
2018-08-02 200.580002 208.380005 200.350006 207.389999 205.946625 62404000
2018-08-03 207.029999 208.740005 205.479996 207.990005 206.54245 33447400
2018-08-06 208.0 209.25 207.070007 209.070007 207.61492900000002 25425400
2018-08-07 209.320007 209.5 206.759995 207.110001 205.668579 25587400
2018-08-08 206.050003 207.809998 204.520004 207.25 205.807602 22525500
2018-08-09 209.529999 209.77999900000003 207.199997 208.880005 207.426254 23492600
2018-08-10 207.360001 209.100006 206.669998 207.529999 206.808411 24611200
2018-08-13 209.309998 210.94999700000002 207.699997 208.869995 208.143753 25890900
2018-08-15 209.220001 210.74000499999997 208.330002 210.24000499999997 209.50898700000002 28807600
2018-08-16 211.75 213.809998 211.47000099999997 213.32000699999998 212.57829300000003 28500400
2018-08-17 213.440002 217.94999700000002 213.16000400000001 217.580002 216.82347099999998 35427000
2018-08-20 218.100006 219.17999300000002 215.11000099999998 215.460007 214.710846 30287700
2018-08-21 216.80000299999998 217.190002 214.02999900000003 215.039993 214.29229700000002 26159800
2018-08-22 214.100006 216.36000099999998 213.83999599999999 215.05000299999998 214.302261 19018100
2018-08-23 214.649994 217.05000299999998 214.600006 215.49000499999997 214.740738 18883200
2018-08-24 216.600006 216.899994 215.11000099999998 216.16000400000001 215.40840099999997 18476400
2018-08-27 217.149994 218.74000499999997 216.330002 217.940002 217.18222000000003 20525100
2018-08-28 219.00999500000003 220.539993 218.919998 219.69999700000002 218.936096 22776800
2018-08-29 220.149994 223.49000499999997 219.41000400000001 222.979996 222.204681 27254800
2018-08-30 223.25 228.25999500000003 222.399994 225.02999900000003 224.247559 48793800
2018-08-31 226.50999500000003 228.86999500000002 226.0 227.63000499999998 226.83853100000002 43340100

Example

  • Pandas Time Series Analysis:Holidays (For holidays that occur on fixed dates (e.g., US Memorial Day or July 4th) an observance rule determines when that holiday is observed if it falls on a weekend or some other non-observed day)
In [910]:
df= pd.read_csv('AAPLusaeed.csv')
df
Out[910]:
Open High Low Close Adj Close Volume
0 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
1 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
3 172.53999299999998 173.470001 172.080002 173.029999 170.47070300000001 22434600
4 173.440002 175.369995 173.050003 175.0 172.41156 23660000
5 174.550003 175.059998 173.41000400000001 174.330002 171.751465 21584000
6 173.16000400000001 174.300003 173.0 174.28999299999998 171.712051 23959900
7 174.58999599999999 175.490005 174.490005 175.279999 172.687408 18667700
8 176.179993 177.360001 175.649994 177.08999599999999 174.470642 25226000
9 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
10 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
11 176.149994 179.25 175.070007 179.100006 176.450928 34386800
12 179.369995 180.100006 178.25 179.259995 176.608551 31193400
13 178.610001 179.580002 177.41000400000001 178.46000700000002 175.82038899999998 32425100
14 177.300003 177.779999 176.600006 177.0 174.381973 27108600
15 177.300003 179.440002 176.820007 177.03999299999998 174.421387 32689100
16 177.25 177.300003 173.199997 174.220001 171.643082 51105100
17 174.509995 174.949997 170.529999 171.110001 168.57908600000002 41529000
18 172.0 172.0 170.059998 171.509995 168.973175 39143000
19 170.16000400000001 170.16000400000001 167.070007 167.96000700000002 165.475677 50640400
20 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
21 166.869995 168.440002 166.5 167.429993 164.953522 32478900
In [911]:
### Umer Saeed Birthday Calendar Code:
### Import Required Lib:
from pandas.tseries.holiday import AbstractHolidayCalendar, nearest_workday, Holiday

class UmerSaeedBirthdayCalendar(AbstractHolidayCalendar):
    rules = [
        Holiday('Umer Saeed Birthday', month=1, day=7,observance=nearest_workday),
    ]
ubc=CustomBusinessDay(calendar=UmerSaeedBirthdayCalendar())
ubc
Out[911]:
<CustomBusinessDay>
In [912]:
df1= pd.date_range(start="1/1/2018", end="1/31/2018",freq=ubc)
df1
Out[912]:
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-09', '2018-01-10', '2018-01-11',
               '2018-01-12', '2018-01-15', '2018-01-16', '2018-01-17',
               '2018-01-18', '2018-01-19', '2018-01-22', '2018-01-23',
               '2018-01-24', '2018-01-25', '2018-01-26', '2018-01-29',
               '2018-01-30', '2018-01-31'],
              dtype='datetime64[ns]', freq='C')
In [913]:
df.set_index(df1,inplace=True)
df
Out[913]:
Open High Low Close Adj Close Volume
2018-01-01 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-02 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-03 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
2018-01-04 172.53999299999998 173.470001 172.080002 173.029999 170.47070300000001 22434600
2018-01-05 173.440002 175.369995 173.050003 175.0 172.41156 23660000
2018-01-09 174.550003 175.059998 173.41000400000001 174.330002 171.751465 21584000
2018-01-10 173.16000400000001 174.300003 173.0 174.28999299999998 171.712051 23959900
2018-01-11 174.58999599999999 175.490005 174.490005 175.279999 172.687408 18667700
2018-01-12 176.179993 177.360001 175.649994 177.08999599999999 174.470642 25226000
2018-01-15 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
2018-01-16 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
2018-01-17 176.149994 179.25 175.070007 179.100006 176.450928 34386800
2018-01-18 179.369995 180.100006 178.25 179.259995 176.608551 31193400
2018-01-19 178.610001 179.580002 177.41000400000001 178.46000700000002 175.82038899999998 32425100
2018-01-22 177.300003 177.779999 176.600006 177.0 174.381973 27108600
2018-01-23 177.300003 179.440002 176.820007 177.03999299999998 174.421387 32689100
2018-01-24 177.25 177.300003 173.199997 174.220001 171.643082 51105100
2018-01-25 174.509995 174.949997 170.529999 171.110001 168.57908600000002 41529000
2018-01-26 172.0 172.0 170.059998 171.509995 168.973175 39143000
2018-01-29 170.16000400000001 170.16000400000001 167.070007 167.96000700000002 165.475677 50640400
2018-01-30 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-31 166.869995 168.440002 166.5 167.429993 164.953522 32478900
In [914]:
from IPython.display import Image
Image(filename='Rules.png')
Out[914]:

Custome week end

In [915]:
ksa=CustomBusinessDay(weekmask='Sun Mon Tue Wed Thu')
In [916]:
df= pd.date_range(start="1/1/2018", end="1/31/2018",freq=ksa)
df
Out[916]:
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-07', '2018-01-08', '2018-01-09', '2018-01-10',
               '2018-01-11', '2018-01-14', '2018-01-15', '2018-01-16',
               '2018-01-17', '2018-01-18', '2018-01-21', '2018-01-22',
               '2018-01-23', '2018-01-24', '2018-01-25', '2018-01-28',
               '2018-01-29', '2018-01-30', '2018-01-31'],
              dtype='datetime64[ns]', freq='C')
In [917]:
ksa=CustomBusinessDay(weekmask='Sun Mon Tue Wed Thu',holidays=['2018-01-08'])
In [918]:
df= pd.date_range(start="1/1/2018", end="1/31/2018",freq=ksa)
df
Out[918]:
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-07', '2018-01-09', '2018-01-10', '2018-01-11',
               '2018-01-14', '2018-01-15', '2018-01-16', '2018-01-17',
               '2018-01-18', '2018-01-21', '2018-01-22', '2018-01-23',
               '2018-01-24', '2018-01-25', '2018-01-28', '2018-01-29',
               '2018-01-30', '2018-01-31'],
              dtype='datetime64[ns]', freq='C')

to_datetime

In [919]:
from IPython.display import YouTubeVideo
YouTubeVideo('igWjq3jtLYI',width=900, height=500)
Out[919]:
  • Most common problem in data analysis is lack of uniformity in the structure of input data.
In [920]:
from IPython.display import Image
Image(filename='date_format.png')
Out[920]:

Example-1

In [921]:
df= pd.read_csv('date_data.txt')
df
Out[921]:
Date
0 2018-12-11
1 Dec 11 2018
2 12/11/2018
3 2018.12.11
4 2018/12/11
In [922]:
df.dtypes
Out[922]:
Date    object
dtype: object
In [923]:
df = df.apply(pd.to_datetime)
df
Out[923]:
Date
0 2018-12-11
1 2018-12-11
2 2018-12-11
3 2018-12-11
4 2018-12-11
In [924]:
df.dtypes
Out[924]:
Date    datetime64[ns]
dtype: object

Example-2

In [925]:
df= pd.read_csv('date_data_time.txt')
df
Out[925]:
Date
0 2018-12-11 2:30:00 PM
1 Dec 11 2018 14:30:00
2 12/11/2018
3 2018.12.11
4 2018/12/11
In [926]:
df.dtypes
Out[926]:
Date    object
dtype: object
In [927]:
df = df.apply(pd.to_datetime)
df
Out[927]:
Date
0 2018-12-11 14:30:00
1 2018-12-11 14:30:00
2 2018-12-11 00:00:00
3 2018-12-11 00:00:00
4 2018-12-11 00:00:00

Example-3

In [928]:
from IPython.display import Image
Image(filename='Format_D_U_E.png')
Out[928]:
  • 5th Jan in Europe, however pd.to_datetime convert to 1st May
In [929]:
pd.to_datetime('5/1/2018')
Out[929]:
Timestamp('2018-05-01 00:00:00')

Solution

In [930]:
pd.to_datetime('5/1/2018',dayfirst=True)
Out[930]:
Timestamp('2018-01-05 00:00:00')
In [931]:
pd.to_datetime('5$1$2018',format='%d$%m$%Y')
Out[931]:
Timestamp('2018-01-05 00:00:00')

Error Handling

In [932]:
df=['2019-07-01','2019-07-02','2019-07-03','2019-07-04','2019-07-05','abc']
In [933]:
pd.to_datetime(df)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   1978         try:
-> 1979             values, tz_parsed = conversion.datetime_to_datetime64(data)
   1980             # If tzaware, these values represent unix timestamps, so we

pandas\_libs\tslibs\conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()

TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-933-bf9335ca4ff6> in <module>
----> 1 pd.to_datetime(df)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    206                 else:
    207                     kwargs[new_arg_name] = new_arg_value
--> 208             return func(*args, **kwargs)
    209 
    210         return wrapper

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, infer_datetime_format, origin, cache)
    789             result = _convert_and_box_cache(arg, cache_array, box)
    790         else:
--> 791             result = convert_listlike(arg, box, format)
    792     else:
    793         result = convert_listlike(np.array([arg]), box, format)[0]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, box, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    458             errors=errors,
    459             require_iso8601=require_iso8601,
--> 460             allow_object=True,
    461         )
    462 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   1982             return values.view("i8"), tz_parsed
   1983         except (ValueError, TypeError):
-> 1984             raise e
   1985 
   1986     if tz_parsed is not None:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   1973             dayfirst=dayfirst,
   1974             yearfirst=yearfirst,
-> 1975             require_iso8601=require_iso8601,
   1976         )
   1977     except ValueError as e:

pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime_object()

pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime_object()

pandas\_libs\tslibs\parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()

C:\ProgramData\Anaconda3\lib\site-packages\dateutil\parser\_parser.py in parse(timestr, parserinfo, **kwargs)
   1356         return parser(parserinfo).parse(timestr, **kwargs)
   1357     else:
-> 1358         return DEFAULTPARSER.parse(timestr, **kwargs)
   1359 
   1360 

C:\ProgramData\Anaconda3\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    647 
    648         if res is None:
--> 649             raise ValueError("Unknown string format:", timestr)
    650 
    651         if len(res) == 0:

ValueError: ('Unknown string format:', 'abc')

Solution-1

In [934]:
pd.to_datetime(df,errors='ignore')
Out[934]:
Index(['2019-07-01', '2019-07-02', '2019-07-03', '2019-07-04', '2019-07-05',
       'abc'],
      dtype='object')

Solution-2

In [935]:
pd.to_datetime(df,errors='coerce')
Out[935]:
DatetimeIndex(['2019-07-01', '2019-07-02', '2019-07-03', '2019-07-04',
               '2019-07-05', 'NaT'],
              dtype='datetime64[ns]', freq=None)

Epoch unix time

  • Epoch (unix time) is number of seconds that have passed since Jan 1, 1970 00:00:00 UTC
In [936]:
from IPython.display import Image
Image(filename='Epoch_Time.png')
Out[936]:
In [937]:
t=1570434845
# t is in seconds
# unit : string, default 'ns'
df=pd.to_datetime([t],unit='s')
df
Out[937]:
DatetimeIndex(['2019-10-07 07:54:05'], dtype='datetime64[ns]', freq=None)
In [938]:
df.view('int64')
Out[938]:
array([1570434845000000000], dtype=int64)

to_timedelta

  • Convert argument to timedelta.
  • Timedeltas are absolute differences in times, expressed in difference units (e.g. days, hours, minutes, seconds).
  • This method converts an argument from a recognized timedelta format / value into a Timedelta type.
  • More Information
In [939]:
td = pd.Timedelta('3 days 06:05:01.000000111') 
td
Out[939]:
Timedelta('3 days 06:05:01.000000')
In [940]:
td.seconds
Out[940]:
21901
In [941]:
td = pd.Timedelta(133, unit ='s')
td
Out[941]:
Timedelta('0 days 00:02:13')
In [942]:
td.seconds
Out[942]:
133

Period

Frequency as a Year

Method-1

In [943]:
a= pd.Period('2019')
a
Out[943]:
Period('2019', 'A-DEC')
In [944]:
a.start_time
Out[944]:
Timestamp('2019-01-01 00:00:00')
In [945]:
a.end_time
Out[945]:
Timestamp('2019-12-31 23:59:59.999999999')

Method-2

In [946]:
a=pd.Period('2019',freq='A')
a
Out[946]:
Period('2019', 'A-DEC')
In [947]:
a.start_time
Out[947]:
Timestamp('2019-01-01 00:00:00')
In [948]:
a.end_time
Out[948]:
Timestamp('2019-12-31 23:59:59.999999999')

Arithmetic Functions

Example-1

In [949]:
a= pd.Period('2019')
a=a+1
In [950]:
a.start_time
Out[950]:
Timestamp('2020-01-01 00:00:00')
In [951]:
a.end_time
Out[951]:
Timestamp('2020-12-31 23:59:59.999999999')

Example-2

In [952]:
a=pd.Period('2019',freq='A')
a=a+1
a
Out[952]:
Period('2020', 'A-DEC')
In [953]:
a.start_time
Out[953]:
Timestamp('2020-01-01 00:00:00')
In [954]:
a.end_time
Out[954]:
Timestamp('2020-12-31 23:59:59.999999999')

Frequency as a Month

Method-1

In [955]:
m=pd.Period('2019',freq='M')
m
Out[955]:
Period('2019-01', 'M')
In [956]:
m.start_time
Out[956]:
Timestamp('2019-01-01 00:00:00')
In [957]:
m.end_time
Out[957]:
Timestamp('2019-01-31 23:59:59.999999999')

Method-2

In [958]:
m=pd.Period('2019-10')
m
Out[958]:
Period('2019-10', 'M')
In [959]:
m.start_time
Out[959]:
Timestamp('2019-10-01 00:00:00')
In [960]:
m.end_time
Out[960]:
Timestamp('2019-10-31 23:59:59.999999999')

Arithmetic Functions

Example-1

In [961]:
m=pd.Period('2019',freq='M')
m=m+1
m
Out[961]:
Period('2019-02', 'M')
In [962]:
m.start_time
Out[962]:
Timestamp('2019-02-01 00:00:00')
In [963]:
m.end_time
Out[963]:
Timestamp('2019-02-28 23:59:59.999999999')

Example-2

In [964]:
m=pd.Period('2019-12')
m=m+1
m
Out[964]:
Period('2020-01', 'M')
In [965]:
m.start_time
Out[965]:
Timestamp('2020-01-01 00:00:00')
In [966]:
m.end_time
Out[966]:
Timestamp('2020-01-31 23:59:59.999999999')

Frequency as a Day

Method-1

In [967]:
d=pd.Period('2019-10-07')
d
Out[967]:
Period('2019-10-07', 'D')
In [968]:
d.start_time
Out[968]:
Timestamp('2019-10-07 00:00:00')
In [969]:
d.end_time
Out[969]:
Timestamp('2019-10-07 23:59:59.999999999')

Method-2

In [970]:
d=pd.Period('2019-10-07',freq='D')
d
Out[970]:
Period('2019-10-07', 'D')
In [971]:
d.start_time
Out[971]:
Timestamp('2019-10-07 00:00:00')
In [972]:
d.end_time
Out[972]:
Timestamp('2019-10-07 23:59:59.999999999')

Arithmetic Functions

Example-1

In [973]:
d=pd.Period('2018-02-28')
d=d+1
d
Out[973]:
Period('2018-03-01', 'D')
In [974]:
d.start_time
Out[974]:
Timestamp('2018-03-01 00:00:00')
In [975]:
d.end_time
Out[975]:
Timestamp('2018-03-01 23:59:59.999999999')

Example-2

In [976]:
d=pd.Period('2020-02-28',freq='D')
d=d+1
d
Out[976]:
Period('2020-02-29', 'D')
In [977]:
d.start_time
Out[977]:
Timestamp('2020-02-29 00:00:00')
In [978]:
d.end_time
Out[978]:
Timestamp('2020-02-29 23:59:59.999999999')

Leap Year

In [979]:
pd.Period('2018-03').is_leap_year
Out[979]:
False
In [980]:
pd.Period('2018').is_leap_year
Out[980]:
False
In [981]:
pd.Period('2020-03').is_leap_year
Out[981]:
True
In [982]:
pd.Period('2020-10-10').is_leap_year
Out[982]:
True

Frequency as an Hour

Method-1

In [983]:
h=pd.Period('2020-02-28',freq='H')
h
Out[983]:
Period('2020-02-28 00:00', 'H')
In [984]:
h.start_time
Out[984]:
Timestamp('2020-02-28 00:00:00')
In [985]:
h.end_time
Out[985]:
Timestamp('2020-02-28 00:59:59.999999999')

Method-2

In [986]:
h=pd.Period('2020-02-28 14:00:00',freq='H')
h
Out[986]:
Period('2020-02-28 14:00', 'H')
In [987]:
h.start_time
Out[987]:
Timestamp('2020-02-28 14:00:00')
In [988]:
h.end_time
Out[988]:
Timestamp('2020-02-28 14:59:59.999999999')

Arithmetic Functions

Example-1

In [989]:
h=pd.Period('2020-02-28',freq='H')
h=h+1
h
Out[989]:
Period('2020-02-28 01:00', 'H')
In [990]:
h.start_time
Out[990]:
Timestamp('2020-02-28 01:00:00')
In [991]:
h.end_time
Out[991]:
Timestamp('2020-02-28 01:59:59.999999999')

Example-2

In [992]:
h=pd.Period('2020-02-28 23:00:00',freq='H')
h=h+1
h
Out[992]:
Period('2020-02-29 00:00', 'H')
In [993]:
h.start_time
Out[993]:
Timestamp('2020-02-29 00:00:00')
In [994]:
h.end_time
Out[994]:
Timestamp('2020-02-29 00:59:59.999999999')

Frequency as a Quarter

Method-1

In [995]:
q=pd.Period('2018',freq='Q')
q
Out[995]:
Period('2018Q1', 'Q-DEC')
In [996]:
q.start_time
Out[996]:
Timestamp('2018-01-01 00:00:00')
In [997]:
q.end_time
Out[997]:
Timestamp('2018-03-31 23:59:59.999999999')

Method-2

In [998]:
q=pd.Period('2018-4',freq='Q')
q
Out[998]:
Period('2018Q2', 'Q-DEC')
In [999]:
q.start_time
Out[999]:
Timestamp('2018-04-01 00:00:00')
In [1000]:
q.end_time
Out[1000]:
Timestamp('2018-06-30 23:59:59.999999999')

Method-3

In [1001]:
q=pd.Period('2018Q3',freq='Q')
q
Out[1001]:
Period('2018Q3', 'Q-DEC')
In [1002]:
q.start_time
Out[1002]:
Timestamp('2018-07-01 00:00:00')
In [1003]:
q.end_time
Out[1003]:
Timestamp('2018-09-30 23:59:59.999999999')

Arithmetic Functions

Example-1

In [1004]:
q=pd.Period('2018',freq='Q')
q=q+1
q
Out[1004]:
Period('2018Q2', 'Q-DEC')
In [1005]:
q.start_time
Out[1005]:
Timestamp('2018-04-01 00:00:00')
In [1006]:
q.end_time
Out[1006]:
Timestamp('2018-06-30 23:59:59.999999999')

Example-2

In [1007]:
q=pd.Period('2018-4',freq='Q')
q=q+1
q
Out[1007]:
Period('2018Q3', 'Q-DEC')
In [1008]:
q.start_time
Out[1008]:
Timestamp('2018-07-01 00:00:00')
In [1009]:
q.end_time
Out[1009]:
Timestamp('2018-09-30 23:59:59.999999999')

Example-3

In [1010]:
q=pd.Period('2018Q3',freq='Q')
q=q+1
q
Out[1010]:
Period('2018Q4', 'Q-DEC')
In [1011]:
q.start_time
Out[1011]:
Timestamp('2018-10-01 00:00:00')
In [1012]:
q.end_time
Out[1012]:
Timestamp('2018-12-31 23:59:59.999999999')

customized Quarter

In [1013]:
qc=pd.Period('2018', freq='Q-Jan')
qc
Out[1013]:
Period('2018Q4', 'Q-JAN')
In [1014]:
qc.start_time
Out[1014]:
Timestamp('2017-11-01 00:00:00')
In [1015]:
qc.end_time
Out[1015]:
Timestamp('2018-01-31 23:59:59.999999999')

Arithmetic Functions

Example-1

In [1016]:
q1=pd.Period('2018', freq='Q-Jan')
In [1017]:
q1.start_time
Out[1017]:
Timestamp('2017-11-01 00:00:00')
In [1018]:
q1.end_time
Out[1018]:
Timestamp('2018-01-31 23:59:59.999999999')
In [1019]:
q2=pd.Period('2016', freq='Q-Jan')
In [1020]:
q2.start_time
Out[1020]:
Timestamp('2015-11-01 00:00:00')
In [1021]:
q2.end_time
Out[1021]:
Timestamp('2016-01-31 23:59:59.999999999')
In [1022]:
q3=q1-q2
q3
Out[1022]:
<8 * QuarterEnds: startingMonth=1>

Example-2

In [1023]:
q1=pd.Period('2018', freq='Q-Nov')
q2=pd.Period('2016', freq='Q-Nov')
q3=q1-q2
q3
Out[1023]:
<8 * QuarterEnds: startingMonth=11>

to_period

  • More Information
  • The function basically converts DatetimeIndex to PeriodIndex.
In [1024]:
df= pd.read_csv('AAPL.csv',parse_dates=["Date"],index_col="Date")
df
Out[1024]:
Open High Low Close Adj Close Volume
Date
2018-01-02 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018-01-03 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
2018-01-04 172.53999299999998 173.470001 172.080002 173.029999 170.47070300000001 22434600
2018-01-05 173.440002 175.369995 173.050003 175.0 172.41156 23660000
2018-01-08 174.350006 175.610001 173.929993 174.350006 171.77117900000002 20567800
2018-01-09 174.550003 175.059998 173.41000400000001 174.330002 171.751465 21584000
2018-01-10 173.16000400000001 174.300003 173.0 174.28999299999998 171.712051 23959900
2018-01-11 174.58999599999999 175.490005 174.490005 175.279999 172.687408 18667700
2018-01-12 176.179993 177.360001 175.649994 177.08999599999999 174.470642 25226000
2018-01-15 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
2018-01-16 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
2018-01-17 176.149994 179.25 175.070007 179.100006 176.450928 34386800
2018-01-18 179.369995 180.100006 178.25 179.259995 176.608551 31193400
2018-01-19 178.610001 179.580002 177.41000400000001 178.46000700000002 175.82038899999998 32425100
2018-01-22 177.300003 177.779999 176.600006 177.0 174.381973 27108600
2018-01-23 177.300003 179.440002 176.820007 177.03999299999998 174.421387 32689100
2018-01-24 177.25 177.300003 173.199997 174.220001 171.643082 51105100
2018-01-25 174.509995 174.949997 170.529999 171.110001 168.57908600000002 41529000
2018-01-26 172.0 172.0 170.059998 171.509995 168.973175 39143000
2018-01-29 170.16000400000001 170.16000400000001 167.070007 167.96000700000002 165.475677 50640400
2018-01-30 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018-01-31 166.869995 168.440002 166.5 167.429993 164.953522 32478900
2018-02-01 167.169998 168.619995 166.759995 167.779999 165.29835500000002 47230800
2018-02-02 166.0 166.800003 160.100006 160.5 158.126022 86593800
2018-02-05 159.100006 163.880005 156.0 156.490005 154.175354 72738500
2018-02-06 154.830002 163.720001 154.0 163.029999 160.61859099999998 68243800
2018-02-07 163.08999599999999 163.399994 159.070007 159.53999299999998 157.180222 51608600
2018-02-08 160.28999299999998 161.0 155.029999 155.149994 152.85514799999999 54390500
2018-02-09 157.070007 157.889999 150.240005 156.41000400000001 154.724808 70672600
2018-02-12 158.5 163.889999 157.509995 162.71000700000002 160.956924 60819500
... ... ... ... ... ... ...
2018-10-25 217.710007 221.38000499999998 216.75 219.80000299999998 219.03575099999998 29855800
2018-10-26 215.899994 220.190002 212.669998 216.30000299999998 215.547913 47258400
2018-10-29 219.190002 219.690002 206.08999599999999 212.24000499999997 211.502045 45935500
2018-10-30 211.149994 215.17999300000002 209.270004 213.30000299999998 212.55835 36660000
2018-10-31 216.88000499999998 220.44999700000002 216.61999500000002 218.86000099999998 218.099014 38358900
2018-11-01 219.05000299999998 222.36000099999998 216.809998 222.22000099999997 221.447327 58323200
2018-11-02 209.550003 213.649994 205.429993 207.479996 206.758575 91328700
2018-11-05 204.300003 204.389999 198.169998 201.58999599999999 200.889053 66163700
2018-11-06 201.919998 204.720001 201.690002 203.770004 203.06149299999998 31882900
2018-11-07 205.970001 210.059998 204.130005 209.94999700000002 209.219986 33424400
2018-11-08 209.979996 210.11999500000002 206.75 208.490005 208.490005 25362600
2018-11-09 205.550003 206.009995 202.25 204.470001 204.470001 34365800
2018-11-12 199.0 199.850006 193.78999299999998 194.169998 194.169998 51135500
2018-11-13 191.630005 197.179993 191.449997 192.229996 192.229996 46882900
2018-11-14 193.899994 194.479996 185.929993 186.800003 186.800003 60801000
2018-11-15 188.389999 191.970001 186.899994 191.41000400000001 191.41000400000001 46478800
2018-11-16 190.5 194.970001 189.46000700000002 193.529999 193.529999 36928300
2018-11-19 190.0 190.699997 184.990005 185.860001 185.860001 41925300
2018-11-20 178.369995 181.470001 175.509995 176.979996 176.979996 67825200
2018-11-21 179.729996 180.270004 176.550003 176.779999 176.779999 31124200
2018-11-23 174.940002 176.600006 172.100006 172.28999299999998 172.28999299999998 23624000
2018-11-26 174.240005 174.949997 170.259995 174.619995 174.619995 44738600
2018-11-27 171.509995 174.770004 170.880005 174.240005 174.240005 41387400
2018-11-28 176.729996 181.28999299999998 174.929993 180.940002 180.940002 46062500
2018-11-29 182.66000400000001 182.800003 177.699997 179.550003 179.550003 41770000
2018-11-30 180.28999299999998 180.330002 177.029999 178.580002 178.580002 39531500
2018-12-03 184.46000700000002 184.940002 181.21000700000002 184.820007 184.820007 40802500
2018-12-04 180.949997 182.389999 176.270004 176.690002 176.690002 41344300
2018-12-06 171.759995 174.779999 170.419998 174.720001 174.720001 43098400
2018-12-07 173.490005 174.490005 168.300003 168.490005 168.490005 41695700

237 rows × 6 columns

In [1025]:
df.to_period('Q')
Out[1025]:
Open High Low Close Adj Close Volume
Date
2018Q1 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
2018Q1 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
2018Q1 172.53999299999998 173.470001 172.080002 173.029999 170.47070300000001 22434600
2018Q1 173.440002 175.369995 173.050003 175.0 172.41156 23660000
2018Q1 174.350006 175.610001 173.929993 174.350006 171.77117900000002 20567800
2018Q1 174.550003 175.059998 173.41000400000001 174.330002 171.751465 21584000
2018Q1 173.16000400000001 174.300003 173.0 174.28999299999998 171.712051 23959900
2018Q1 174.58999599999999 175.490005 174.490005 175.279999 172.687408 18667700
2018Q1 176.179993 177.360001 175.649994 177.08999599999999 174.470642 25226000
2018Q1 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
2018Q1 177.899994 179.389999 176.139999 176.190002 173.583969 29565900
2018Q1 176.149994 179.25 175.070007 179.100006 176.450928 34386800
2018Q1 179.369995 180.100006 178.25 179.259995 176.608551 31193400
2018Q1 178.610001 179.580002 177.41000400000001 178.46000700000002 175.82038899999998 32425100
2018Q1 177.300003 177.779999 176.600006 177.0 174.381973 27108600
2018Q1 177.300003 179.440002 176.820007 177.03999299999998 174.421387 32689100
2018Q1 177.25 177.300003 173.199997 174.220001 171.643082 51105100
2018Q1 174.509995 174.949997 170.529999 171.110001 168.57908600000002 41529000
2018Q1 172.0 172.0 170.059998 171.509995 168.973175 39143000
2018Q1 170.16000400000001 170.16000400000001 167.070007 167.96000700000002 165.475677 50640400
2018Q1 165.529999 167.369995 164.699997 166.970001 164.500336 46048200
2018Q1 166.869995 168.440002 166.5 167.429993 164.953522 32478900
2018Q1 167.169998 168.619995 166.759995 167.779999 165.29835500000002 47230800
2018Q1 166.0 166.800003 160.100006 160.5 158.126022 86593800
2018Q1 159.100006 163.880005 156.0 156.490005 154.175354 72738500
2018Q1 154.830002 163.720001 154.0 163.029999 160.61859099999998 68243800
2018Q1 163.08999599999999 163.399994 159.070007 159.53999299999998 157.180222 51608600
2018Q1 160.28999299999998 161.0 155.029999 155.149994 152.85514799999999 54390500
2018Q1 157.070007 157.889999 150.240005 156.41000400000001 154.724808 70672600
2018Q1 158.5 163.889999 157.509995 162.71000700000002 160.956924 60819500
... ... ... ... ... ... ...
2018Q4 217.710007 221.38000499999998 216.75 219.80000299999998 219.03575099999998 29855800
2018Q4 215.899994 220.190002 212.669998 216.30000299999998 215.547913 47258400
2018Q4 219.190002 219.690002 206.08999599999999 212.24000499999997 211.502045 45935500
2018Q4 211.149994 215.17999300000002 209.270004 213.30000299999998 212.55835 36660000
2018Q4 216.88000499999998 220.44999700000002 216.61999500000002 218.86000099999998 218.099014 38358900
2018Q4 219.05000299999998 222.36000099999998 216.809998 222.22000099999997 221.447327 58323200
2018Q4 209.550003 213.649994 205.429993 207.479996 206.758575 91328700
2018Q4 204.300003 204.389999 198.169998 201.58999599999999 200.889053 66163700
2018Q4 201.919998 204.720001 201.690002 203.770004 203.06149299999998 31882900
2018Q4 205.970001 210.059998 204.130005 209.94999700000002 209.219986 33424400
2018Q4 209.979996 210.11999500000002 206.75 208.490005 208.490005 25362600
2018Q4 205.550003 206.009995 202.25 204.470001 204.470001 34365800
2018Q4 199.0 199.850006 193.78999299999998 194.169998 194.169998 51135500
2018Q4 191.630005 197.179993 191.449997 192.229996 192.229996 46882900
2018Q4 193.899994 194.479996 185.929993 186.800003 186.800003 60801000
2018Q4 188.389999 191.970001 186.899994 191.41000400000001 191.41000400000001 46478800
2018Q4 190.5 194.970001 189.46000700000002 193.529999 193.529999 36928300
2018Q4 190.0 190.699997 184.990005 185.860001 185.860001 41925300
2018Q4 178.369995 181.470001 175.509995 176.979996 176.979996 67825200
2018Q4 179.729996 180.270004 176.550003 176.779999 176.779999 31124200
2018Q4 174.940002 176.600006 172.100006 172.28999299999998 172.28999299999998 23624000
2018Q4 174.240005 174.949997 170.259995 174.619995 174.619995 44738600
2018Q4 171.509995 174.770004 170.880005 174.240005 174.240005 41387400
2018Q4 176.729996 181.28999299999998 174.929993 180.940002 180.940002 46062500
2018Q4 182.66000400000001 182.800003 177.699997 179.550003 179.550003 41770000
2018Q4 180.28999299999998 180.330002 177.029999 178.580002 178.580002 39531500
2018Q4 184.46000700000002 184.940002 181.21000700000002 184.820007 184.820007 40802500
2018Q4 180.949997 182.389999 176.270004 176.690002 176.690002 41344300
2018Q4 171.759995 174.779999 170.419998 174.720001 174.720001 43098400
2018Q4 173.490005 174.490005 168.300003 168.490005 168.490005 41695700

237 rows × 6 columns

offsets

Example-1

In [1026]:
df= pd.read_csv('AAPL.csv',parse_dates=['Date'])
df.head()
Out[1026]:
Date Open High Low Close Adj Close Volume
0 2018-01-02 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900
1 2018-01-03 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900
2 2018-01-04 172.53999299999998 173.470001 172.080002 173.029999 170.47070300000001 22434600
3 2018-01-05 173.440002 175.369995 173.050003 175.0 172.41156 23660000
4 2018-01-08 174.350006 175.610001 173.929993 174.350006 171.77117900000002 20567800
In [1027]:
df["NEW_DATE"] = df['Date'] - pd.offsets.DateOffset(years=1)
df.head()
Out[1027]:
Date Open High Low Close Adj Close Volume NEW_DATE
0 2018-01-02 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900 2017-01-02
1 2018-01-03 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900 2017-01-03
2 2018-01-04 172.53999299999998 173.470001 172.080002 173.029999 170.47070300000001 22434600 2017-01-04
3 2018-01-05 173.440002 175.369995 173.050003 175.0 172.41156 23660000 2017-01-05
4 2018-01-08 174.350006 175.610001 173.929993 174.350006 171.77117900000002 20567800 2017-01-08

Example-2

In [1028]:
df= pd.read_csv('AAPL.csv',parse_dates=['Date'])
df["NEW_DATE"] = df['Date'] + pd.offsets.Day(10)
df.head()
Out[1028]:
Date Open High Low Close Adj Close Volume NEW_DATE
0 2018-01-02 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900 2018-01-12
1 2018-01-03 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900 2018-01-13
2 2018-01-04 172.53999299999998 173.470001 172.080002 173.029999 170.47070300000001 22434600 2018-01-14
3 2018-01-05 173.440002 175.369995 173.050003 175.0 172.41156 23660000 2018-01-15
4 2018-01-08 174.350006 175.610001 173.929993 174.350006 171.77117900000002 20567800 2018-01-18

Example-3

In [1029]:
df= pd.read_csv('AAPL.csv',parse_dates=['Date'])
df["NEW_DATE"] = df['Date'] + pd.offsets.BDay(10)
df.head()
Out[1029]:
Date Open High Low Close Adj Close Volume NEW_DATE
0 2018-01-02 170.16000400000001 172.300003 169.259995 172.259995 169.712067 25555900 2018-01-16
1 2018-01-03 172.529999 174.550003 171.96000700000002 172.229996 169.68251 29517900 2018-01-17
2 2018-01-04 172.53999299999998 173.470001 172.080002 173.029999 170.47070300000001 22434600 2018-01-18
3 2018-01-05 173.440002 175.369995 173.050003 175.0 172.41156 23660000 2018-01-19
4 2018-01-08 174.350006 175.610001 173.929993 174.350006 171.77117900000002 20567800 2018-01-22

Example-4

In [1030]:
h=pd.Period('2018-12-15 20:00:00',freq='H')
h=h+pd.offsets.Hour(1)
h
Out[1030]:
Period('2018-12-15 21:00', 'H')
In [1031]:
h.start_time
Out[1031]:
Timestamp('2018-12-15 21:00:00')
In [1032]:
h.end_time
Out[1032]:
Timestamp('2018-12-15 21:59:59.999999999')

Example-5

In [1033]:
h=pd.Period('2018-12-15 20:00:00',freq='H')
h=h+pd.offsets.Hour(-22)
h
Out[1033]:
Period('2018-12-14 22:00', 'H')
In [1034]:
h.start_time
Out[1034]:
Timestamp('2018-12-14 22:00:00')
In [1035]:
h.end_time
Out[1035]:
Timestamp('2018-12-14 22:59:59.999999999')

asfreq Function

Example-1

In [1036]:
q=pd.Period('2018-5',freq='Q')
In [1037]:
q.start_time
Out[1037]:
Timestamp('2018-04-01 00:00:00')
In [1038]:
q.end_time
Out[1038]:
Timestamp('2018-06-30 23:59:59.999999999')
In [1039]:
q=q.asfreq('M',how='start')
In [1040]:
q.start_time
Out[1040]:
Timestamp('2018-04-01 00:00:00')
In [1041]:
q.end_time
Out[1041]:
Timestamp('2018-04-30 23:59:59.999999999')

Example-2

In [1042]:
q=pd.Period('2018-5',freq='Q')
In [1043]:
q.start_time
Out[1043]:
Timestamp('2018-04-01 00:00:00')
In [1044]:
q.end_time
Out[1044]:
Timestamp('2018-06-30 23:59:59.999999999')
In [1045]:
q=q.asfreq('M',how='end')
In [1046]:
q.start_time
Out[1046]:
Timestamp('2018-06-01 00:00:00')
In [1047]:
q.end_time
Out[1047]:
Timestamp('2018-06-30 23:59:59.999999999')

PeriodIndex

In [1048]:
from IPython.display import YouTubeVideo
YouTubeVideo('3l9YOS4y24Y',width=900, height=500)
Out[1048]:

Example-1

In [1049]:
df=pd.period_range('2011','2018',freq='Q')
df
Out[1049]:
PeriodIndex(['2011Q1', '2011Q2', '2011Q3', '2011Q4', '2012Q1', '2012Q2',
             '2012Q3', '2012Q4', '2013Q1', '2013Q2', '2013Q3', '2013Q4',
             '2014Q1', '2014Q2', '2014Q3', '2014Q4', '2015Q1', '2015Q2',
             '2015Q3', '2015Q4', '2016Q1', '2016Q2', '2016Q3', '2016Q4',
             '2017Q1', '2017Q2', '2017Q3', '2017Q4', '2018Q1'],
            dtype='period[Q-DEC]', freq='Q-DEC')

Example-2

In [1050]:
df=pd.period_range('2011','2018',freq='Q-Jan')
df
Out[1050]:
PeriodIndex(['2011Q4', '2012Q1', '2012Q2', '2012Q3', '2012Q4', '2013Q1',
             '2013Q2', '2013Q3', '2013Q4', '2014Q1', '2014Q2', '2014Q3',
             '2014Q4', '2015Q1', '2015Q2', '2015Q3', '2015Q4', '2016Q1',
             '2016Q2', '2016Q3', '2016Q4', '2017Q1', '2017Q2', '2017Q3',
             '2017Q4', '2018Q1', '2018Q2', '2018Q3', '2018Q4'],
            dtype='period[Q-JAN]', freq='Q-JAN')
In [1051]:
df[0].start_time
Out[1051]:
Timestamp('2010-11-01 00:00:00')
In [1052]:
df[0].end_time
Out[1052]:
Timestamp('2011-01-31 23:59:59.999999999')

Example-3

In [1053]:
df= pd.read_csv('ind_data.csv')
df
Out[1053]:
idx Values
0 2011Q1 20000
1 2011Q2 40000
2 2011Q3 60000
3 2011Q4 80000
4 2012Q1 100000
5 2012Q2 120000
6 2012Q3 140000
7 2012Q4 160000
8 2013Q1 180000
9 2013Q2 200000
10 2013Q3 220000
11 2013Q4 240000
In [1054]:
df.set_index('idx',inplace=True)
In [1055]:
df
Out[1055]:
Values
idx
2011Q1 20000
2011Q2 40000
2011Q3 60000
2011Q4 80000
2012Q1 100000
2012Q2 120000
2012Q3 140000
2012Q4 160000
2013Q1 180000
2013Q2 200000
2013Q3 220000
2013Q4 240000
In [1056]:
df.index
Out[1056]:
Index(['2011Q1', '2011Q2', '2011Q3', '2011Q4', '2012Q1', '2012Q2', '2012Q3',
       '2012Q4', '2013Q1', '2013Q2', '2013Q3', '2013Q4'],
      dtype='object', name='idx')
In [1057]:
df.index= pd.PeriodIndex(df.index,freq='Q')
In [1058]:
df.index
Out[1058]:
PeriodIndex(['2011Q1', '2011Q2', '2011Q3', '2011Q4', '2012Q1', '2012Q2',
             '2012Q3', '2012Q4', '2013Q1', '2013Q2', '2013Q3', '2013Q4'],
            dtype='period[Q-DEC]', name='idx', freq='Q-DEC')
In [1059]:
df['2012']
Out[1059]:
Values
idx
2012Q1 100000
2012Q2 120000
2012Q3 140000
2012Q4 160000
In [1060]:
df['2011':'2012']
Out[1060]:
Values
idx
2011Q1 20000
2011Q2 40000
2011Q3 60000
2011Q4 80000
2012Q1 100000
2012Q2 120000
2012Q3 140000
2012Q4 160000

Example-4

In [1061]:
df= pd.read_csv('ts_ds.csv')
df
Out[1061]:
LineItem 2017Q1 2017Q2 2017Q3 2017Q4 2018Q1
0 Revenue 115904 120854 118179 130936 117542
1 Expenses 86544 89485 87484 97743 87688
2 Profit 29360 31369 33193 33193 29854

Required Oputput

In [1062]:
from IPython.display import Image
Image(filename='Required_Output.png')
Out[1062]:
In [1063]:
df.set_index('LineItem', inplace = True)
In [1064]:
df
Out[1064]:
2017Q1 2017Q2 2017Q3 2017Q4 2018Q1
LineItem
Revenue 115904 120854 118179 130936 117542
Expenses 86544 89485 87484 97743 87688
Profit 29360 31369 33193 33193 29854
In [1065]:
df=df.T
df
Out[1065]:
LineItem Revenue Expenses Profit
2017Q1 115904 86544 29360
2017Q2 120854 89485 31369
2017Q3 118179 87484 33193
2017Q4 130936 97743 33193
2018Q1 117542 87688 29854
In [1066]:
df.index
Out[1066]:
Index(['2017Q1', '2017Q2', '2017Q3', '2017Q4', '2018Q1'], dtype='object')
In [1067]:
df.index= pd.PeriodIndex(df.index,freq='Q-Jan')
In [1068]:
df['Start Date']=df.index.map(lambda x: x.start_time)
df['End Date']=df.index.map(lambda x: x.end_time.floor("H"))
df
Out[1068]:
LineItem Revenue Expenses Profit Start Date End Date
2017Q1 115904 86544 29360 2016-02-01 2016-04-30 23:00:00
2017Q2 120854 89485 31369 2016-05-01 2016-07-31 23:00:00
2017Q3 118179 87484 33193 2016-08-01 2016-10-31 23:00:00
2017Q4 130936 97743 33193 2016-11-01 2017-01-31 23:00:00
2018Q1 117542 87688 29854 2017-02-01 2017-04-30 23:00:00
In [1069]:
# Day Name
df['DayName']=df['Start Date'].dt.weekday_name

# Get day of the month that a Period falls on
df['dayc']=df['Start Date'].dt.day

# Get Month Name
df['month']=df['Start Date'].dt.month
# Return the day of the year.
df['daynumber']=df['Start Date'].dt.dayofyear
# Get quarter Number
df['quarterernumb']=df['Start Date'].dt.quarter

# Day of the week the period lies in, with Monday=0 and Sunday=6.
df['dow']=df['Start Date'].dt.dayofweek
# Get the total number of days in the month that this period falls on.
df['dim']=df['Start Date'].dt.days_in_month

# year
df['yeartest']=df['Start Date'].dt.year

# Get the week of the year on the given Period.
df['weekn']=df['Start Date'].dt.week

df
Out[1069]:
LineItem Revenue Expenses Profit Start Date End Date DayName dayc month daynumber quarterernumb dow dim yeartest weekn
2017Q1 115904 86544 29360 2016-02-01 2016-04-30 23:00:00 Monday 1 2 32 1 0 29 2016 5
2017Q2 120854 89485 31369 2016-05-01 2016-07-31 23:00:00 Sunday 1 5 122 2 6 31 2016 17
2017Q3 118179 87484 33193 2016-08-01 2016-10-31 23:00:00 Monday 1 8 214 3 0 31 2016 31
2017Q4 130936 97743 33193 2016-11-01 2017-01-31 23:00:00 Tuesday 1 11 306 4 1 30 2016 44
2018Q1 117542 87688 29854 2017-02-01 2017-04-30 23:00:00 Wednesday 1 2 32 1 2 28 2017 5

TimeZone

In [1070]:
from IPython.display import Image
Image(filename='TimeZone.png')
Out[1070]:
  • Two Types of DateTime objects in Python:
    • 1) Naive (No timezone awareness)
    • 2) Time Zone aware datetime
In [1071]:
from IPython.display import YouTubeVideo
YouTubeVideo('9IW2GIJajLs',width=900, height=500)
Out[1071]:
In [1072]:
from pytz import all_timezones
In [1073]:
all_timezones
Out[1073]:
['Africa/Abidjan',
 'Africa/Accra',
 'Africa/Addis_Ababa',
 'Africa/Algiers',
 'Africa/Asmara',
 'Africa/Asmera',
 'Africa/Bamako',
 'Africa/Bangui',
 'Africa/Banjul',
 'Africa/Bissau',
 'Africa/Blantyre',
 'Africa/Brazzaville',
 'Africa/Bujumbura',
 'Africa/Cairo',
 'Africa/Casablanca',
 'Africa/Ceuta',
 'Africa/Conakry',
 'Africa/Dakar',
 'Africa/Dar_es_Salaam',
 'Africa/Djibouti',
 'Africa/Douala',
 'Africa/El_Aaiun',
 'Africa/Freetown',
 'Africa/Gaborone',
 'Africa/Harare',
 'Africa/Johannesburg',
 'Africa/Juba',
 'Africa/Kampala',
 'Africa/Khartoum',
 'Africa/Kigali',
 'Africa/Kinshasa',
 'Africa/Lagos',
 'Africa/Libreville',
 'Africa/Lome',
 'Africa/Luanda',
 'Africa/Lubumbashi',
 'Africa/Lusaka',
 'Africa/Malabo',
 'Africa/Maputo',
 'Africa/Maseru',
 'Africa/Mbabane',
 'Africa/Mogadishu',
 'Africa/Monrovia',
 'Africa/Nairobi',
 'Africa/Ndjamena',
 'Africa/Niamey',
 'Africa/Nouakchott',
 'Africa/Ouagadougou',
 'Africa/Porto-Novo',
 'Africa/Sao_Tome',
 'Africa/Timbuktu',
 'Africa/Tripoli',
 'Africa/Tunis',
 'Africa/Windhoek',
 'America/Adak',
 'America/Anchorage',
 'America/Anguilla',
 'America/Antigua',
 'America/Araguaina',
 'America/Argentina/Buenos_Aires',
 'America/Argentina/Catamarca',
 'America/Argentina/ComodRivadavia',
 'America/Argentina/Cordoba',
 'America/Argentina/Jujuy',
 'America/Argentina/La_Rioja',
 'America/Argentina/Mendoza',
 'America/Argentina/Rio_Gallegos',
 'America/Argentina/Salta',
 'America/Argentina/San_Juan',
 'America/Argentina/San_Luis',
 'America/Argentina/Tucuman',
 'America/Argentina/Ushuaia',
 'America/Aruba',
 'America/Asuncion',
 'America/Atikokan',
 'America/Atka',
 'America/Bahia',
 'America/Bahia_Banderas',
 'America/Barbados',
 'America/Belem',
 'America/Belize',
 'America/Blanc-Sablon',
 'America/Boa_Vista',
 'America/Bogota',
 'America/Boise',
 'America/Buenos_Aires',
 'America/Cambridge_Bay',
 'America/Campo_Grande',
 'America/Cancun',
 'America/Caracas',
 'America/Catamarca',
 'America/Cayenne',
 'America/Cayman',
 'America/Chicago',
 'America/Chihuahua',
 'America/Coral_Harbour',
 'America/Cordoba',
 'America/Costa_Rica',
 'America/Creston',
 'America/Cuiaba',
 'America/Curacao',
 'America/Danmarkshavn',
 'America/Dawson',
 'America/Dawson_Creek',
 'America/Denver',
 'America/Detroit',
 'America/Dominica',
 'America/Edmonton',
 'America/Eirunepe',
 'America/El_Salvador',
 'America/Ensenada',
 'America/Fort_Nelson',
 'America/Fort_Wayne',
 'America/Fortaleza',
 'America/Glace_Bay',
 'America/Godthab',
 'America/Goose_Bay',
 'America/Grand_Turk',
 'America/Grenada',
 'America/Guadeloupe',
 'America/Guatemala',
 'America/Guayaquil',
 'America/Guyana',
 'America/Halifax',
 'America/Havana',
 'America/Hermosillo',
 'America/Indiana/Indianapolis',
 'America/Indiana/Knox',
 'America/Indiana/Marengo',
 'America/Indiana/Petersburg',
 'America/Indiana/Tell_City',
 'America/Indiana/Vevay',
 'America/Indiana/Vincennes',
 'America/Indiana/Winamac',
 'America/Indianapolis',
 'America/Inuvik',
 'America/Iqaluit',
 'America/Jamaica',
 'America/Jujuy',
 'America/Juneau',
 'America/Kentucky/Louisville',
 'America/Kentucky/Monticello',
 'America/Knox_IN',
 'America/Kralendijk',
 'America/La_Paz',
 'America/Lima',
 'America/Los_Angeles',
 'America/Louisville',
 'America/Lower_Princes',
 'America/Maceio',
 'America/Managua',
 'America/Manaus',
 'America/Marigot',
 'America/Martinique',
 'America/Matamoros',
 'America/Mazatlan',
 'America/Mendoza',
 'America/Menominee',
 'America/Merida',
 'America/Metlakatla',
 'America/Mexico_City',
 'America/Miquelon',
 'America/Moncton',
 'America/Monterrey',
 'America/Montevideo',
 'America/Montreal',
 'America/Montserrat',
 'America/Nassau',
 'America/New_York',
 'America/Nipigon',
 'America/Nome',
 'America/Noronha',
 'America/North_Dakota/Beulah',
 'America/North_Dakota/Center',
 'America/North_Dakota/New_Salem',
 'America/Ojinaga',
 'America/Panama',
 'America/Pangnirtung',
 'America/Paramaribo',
 'America/Phoenix',
 'America/Port-au-Prince',
 'America/Port_of_Spain',
 'America/Porto_Acre',
 'America/Porto_Velho',
 'America/Puerto_Rico',
 'America/Punta_Arenas',
 'America/Rainy_River',
 'America/Rankin_Inlet',
 'America/Recife',
 'America/Regina',
 'America/Resolute',
 'America/Rio_Branco',
 'America/Rosario',
 'America/Santa_Isabel',
 'America/Santarem',
 'America/Santiago',
 'America/Santo_Domingo',
 'America/Sao_Paulo',
 'America/Scoresbysund',
 'America/Shiprock',
 'America/Sitka',
 'America/St_Barthelemy',
 'America/St_Johns',
 'America/St_Kitts',
 'America/St_Lucia',
 'America/St_Thomas',
 'America/St_Vincent',
 'America/Swift_Current',
 'America/Tegucigalpa',
 'America/Thule',
 'America/Thunder_Bay',
 'America/Tijuana',
 'America/Toronto',
 'America/Tortola',
 'America/Vancouver',
 'America/Virgin',
 'America/Whitehorse',
 'America/Winnipeg',
 'America/Yakutat',
 'America/Yellowknife',
 'Antarctica/Casey',
 'Antarctica/Davis',
 'Antarctica/DumontDUrville',
 'Antarctica/Macquarie',
 'Antarctica/Mawson',
 'Antarctica/McMurdo',
 'Antarctica/Palmer',
 'Antarctica/Rothera',
 'Antarctica/South_Pole',
 'Antarctica/Syowa',
 'Antarctica/Troll',
 'Antarctica/Vostok',
 'Arctic/Longyearbyen',
 'Asia/Aden',
 'Asia/Almaty',
 'Asia/Amman',
 'Asia/Anadyr',
 'Asia/Aqtau',
 'Asia/Aqtobe',
 'Asia/Ashgabat',
 'Asia/Ashkhabad',
 'Asia/Atyrau',
 'Asia/Baghdad',
 'Asia/Bahrain',
 'Asia/Baku',
 'Asia/Bangkok',
 'Asia/Barnaul',
 'Asia/Beirut',
 'Asia/Bishkek',
 'Asia/Brunei',
 'Asia/Calcutta',
 'Asia/Chita',
 'Asia/Choibalsan',
 'Asia/Chongqing',
 'Asia/Chungking',
 'Asia/Colombo',
 'Asia/Dacca',
 'Asia/Damascus',
 'Asia/Dhaka',
 'Asia/Dili',
 'Asia/Dubai',
 'Asia/Dushanbe',
 'Asia/Famagusta',
 'Asia/Gaza',
 'Asia/Harbin',
 'Asia/Hebron',
 'Asia/Ho_Chi_Minh',
 'Asia/Hong_Kong',
 'Asia/Hovd',
 'Asia/Irkutsk',
 'Asia/Istanbul',
 'Asia/Jakarta',
 'Asia/Jayapura',
 'Asia/Jerusalem',
 'Asia/Kabul',
 'Asia/Kamchatka',
 'Asia/Karachi',
 'Asia/Kashgar',
 'Asia/Kathmandu',
 'Asia/Katmandu',
 'Asia/Khandyga',
 'Asia/Kolkata',
 'Asia/Krasnoyarsk',
 'Asia/Kuala_Lumpur',
 'Asia/Kuching',
 'Asia/Kuwait',
 'Asia/Macao',
 'Asia/Macau',
 'Asia/Magadan',
 'Asia/Makassar',
 'Asia/Manila',
 'Asia/Muscat',
 'Asia/Nicosia',
 'Asia/Novokuznetsk',
 'Asia/Novosibirsk',
 'Asia/Omsk',
 'Asia/Oral',
 'Asia/Phnom_Penh',
 'Asia/Pontianak',
 'Asia/Pyongyang',
 'Asia/Qatar',
 'Asia/Qostanay',
 'Asia/Qyzylorda',
 'Asia/Rangoon',
 'Asia/Riyadh',
 'Asia/Saigon',
 'Asia/Sakhalin',
 'Asia/Samarkand',
 'Asia/Seoul',
 'Asia/Shanghai',
 'Asia/Singapore',
 'Asia/Srednekolymsk',
 'Asia/Taipei',
 'Asia/Tashkent',
 'Asia/Tbilisi',
 'Asia/Tehran',
 'Asia/Tel_Aviv',
 'Asia/Thimbu',
 'Asia/Thimphu',
 'Asia/Tokyo',
 'Asia/Tomsk',
 'Asia/Ujung_Pandang',
 'Asia/Ulaanbaatar',
 'Asia/Ulan_Bator',
 'Asia/Urumqi',
 'Asia/Ust-Nera',
 'Asia/Vientiane',
 'Asia/Vladivostok',
 'Asia/Yakutsk',
 'Asia/Yangon',
 'Asia/Yekaterinburg',
 'Asia/Yerevan',
 'Atlantic/Azores',
 'Atlantic/Bermuda',
 'Atlantic/Canary',
 'Atlantic/Cape_Verde',
 'Atlantic/Faeroe',
 'Atlantic/Faroe',
 'Atlantic/Jan_Mayen',
 'Atlantic/Madeira',
 'Atlantic/Reykjavik',
 'Atlantic/South_Georgia',
 'Atlantic/St_Helena',
 'Atlantic/Stanley',
 'Australia/ACT',
 'Australia/Adelaide',
 'Australia/Brisbane',
 'Australia/Broken_Hill',
 'Australia/Canberra',
 'Australia/Currie',
 'Australia/Darwin',
 'Australia/Eucla',
 'Australia/Hobart',
 'Australia/LHI',
 'Australia/Lindeman',
 'Australia/Lord_Howe',
 'Australia/Melbourne',
 'Australia/NSW',
 'Australia/North',
 'Australia/Perth',
 'Australia/Queensland',
 'Australia/South',
 'Australia/Sydney',
 'Australia/Tasmania',
 'Australia/Victoria',
 'Australia/West',
 'Australia/Yancowinna',
 'Brazil/Acre',
 'Brazil/DeNoronha',
 'Brazil/East',
 'Brazil/West',
 'CET',
 'CST6CDT',
 'Canada/Atlantic',
 'Canada/Central',
 'Canada/Eastern',
 'Canada/Mountain',
 'Canada/Newfoundland',
 'Canada/Pacific',
 'Canada/Saskatchewan',
 'Canada/Yukon',
 'Chile/Continental',
 'Chile/EasterIsland',
 'Cuba',
 'EET',
 'EST',
 'EST5EDT',
 'Egypt',
 'Eire',
 'Etc/GMT',
 'Etc/GMT+0',
 'Etc/GMT+1',
 'Etc/GMT+10',
 'Etc/GMT+11',
 'Etc/GMT+12',
 'Etc/GMT+2',
 'Etc/GMT+3',
 'Etc/GMT+4',
 'Etc/GMT+5',
 'Etc/GMT+6',
 'Etc/GMT+7',
 'Etc/GMT+8',
 'Etc/GMT+9',
 'Etc/GMT-0',
 'Etc/GMT-1',
 'Etc/GMT-10',
 'Etc/GMT-11',
 'Etc/GMT-12',
 'Etc/GMT-13',
 'Etc/GMT-14',
 'Etc/GMT-2',
 'Etc/GMT-3',
 'Etc/GMT-4',
 'Etc/GMT-5',
 'Etc/GMT-6',
 'Etc/GMT-7',
 'Etc/GMT-8',
 'Etc/GMT-9',
 'Etc/GMT0',
 'Etc/Greenwich',
 'Etc/UCT',
 'Etc/UTC',
 'Etc/Universal',
 'Etc/Zulu',
 'Europe/Amsterdam',
 'Europe/Andorra',
 'Europe/Astrakhan',
 'Europe/Athens',
 'Europe/Belfast',
 'Europe/Belgrade',
 'Europe/Berlin',
 'Europe/Bratislava',
 'Europe/Brussels',
 'Europe/Bucharest',
 'Europe/Budapest',
 'Europe/Busingen',
 'Europe/Chisinau',
 'Europe/Copenhagen',
 'Europe/Dublin',
 'Europe/Gibraltar',
 'Europe/Guernsey',
 'Europe/Helsinki',
 'Europe/Isle_of_Man',
 'Europe/Istanbul',
 'Europe/Jersey',
 'Europe/Kaliningrad',
 'Europe/Kiev',
 'Europe/Kirov',
 'Europe/Lisbon',
 'Europe/Ljubljana',
 'Europe/London',
 'Europe/Luxembourg',
 'Europe/Madrid',
 'Europe/Malta',
 'Europe/Mariehamn',
 'Europe/Minsk',
 'Europe/Monaco',
 'Europe/Moscow',
 'Europe/Nicosia',
 'Europe/Oslo',
 'Europe/Paris',
 'Europe/Podgorica',
 'Europe/Prague',
 'Europe/Riga',
 'Europe/Rome',
 'Europe/Samara',
 'Europe/San_Marino',
 'Europe/Sarajevo',
 'Europe/Saratov',
 'Europe/Simferopol',
 'Europe/Skopje',
 'Europe/Sofia',
 'Europe/Stockholm',
 'Europe/Tallinn',
 'Europe/Tirane',
 'Europe/Tiraspol',
 'Europe/Ulyanovsk',
 'Europe/Uzhgorod',
 'Europe/Vaduz',
 'Europe/Vatican',
 'Europe/Vienna',
 'Europe/Vilnius',
 'Europe/Volgograd',
 'Europe/Warsaw',
 'Europe/Zagreb',
 'Europe/Zaporozhye',
 'Europe/Zurich',
 'GB',
 'GB-Eire',
 'GMT',
 'GMT+0',
 'GMT-0',
 'GMT0',
 'Greenwich',
 'HST',
 'Hongkong',
 'Iceland',
 'Indian/Antananarivo',
 'Indian/Chagos',
 'Indian/Christmas',
 'Indian/Cocos',
 'Indian/Comoro',
 'Indian/Kerguelen',
 'Indian/Mahe',
 'Indian/Maldives',
 'Indian/Mauritius',
 'Indian/Mayotte',
 'Indian/Reunion',
 'Iran',
 'Israel',
 'Jamaica',
 'Japan',
 'Kwajalein',
 'Libya',
 'MET',
 'MST',
 'MST7MDT',
 'Mexico/BajaNorte',
 'Mexico/BajaSur',
 'Mexico/General',
 'NZ',
 'NZ-CHAT',
 'Navajo',
 'PRC',
 'PST8PDT',
 'Pacific/Apia',
 'Pacific/Auckland',
 'Pacific/Bougainville',
 'Pacific/Chatham',
 'Pacific/Chuuk',
 'Pacific/Easter',
 'Pacific/Efate',
 'Pacific/Enderbury',
 'Pacific/Fakaofo',
 'Pacific/Fiji',
 'Pacific/Funafuti',
 'Pacific/Galapagos',
 'Pacific/Gambier',
 'Pacific/Guadalcanal',
 'Pacific/Guam',
 'Pacific/Honolulu',
 'Pacific/Johnston',
 'Pacific/Kiritimati',
 'Pacific/Kosrae',
 'Pacific/Kwajalein',
 'Pacific/Majuro',
 'Pacific/Marquesas',
 'Pacific/Midway',
 'Pacific/Nauru',
 'Pacific/Niue',
 'Pacific/Norfolk',
 'Pacific/Noumea',
 'Pacific/Pago_Pago',
 'Pacific/Palau',
 'Pacific/Pitcairn',
 'Pacific/Pohnpei',
 'Pacific/Ponape',
 'Pacific/Port_Moresby',
 'Pacific/Rarotonga',
 'Pacific/Saipan',
 'Pacific/Samoa',
 'Pacific/Tahiti',
 'Pacific/Tarawa',
 'Pacific/Tongatapu',
 'Pacific/Truk',
 'Pacific/Wake',
 'Pacific/Wallis',
 'Pacific/Yap',
 'Poland',
 'Portugal',
 'ROC',
 'ROK',
 'Singapore',
 'Turkey',
 'UCT',
 'US/Alaska',
 'US/Aleutian',
 'US/Arizona',
 'US/Central',
 'US/East-Indiana',
 'US/Eastern',
 'US/Hawaii',
 'US/Indiana-Starke',
 'US/Michigan',
 'US/Mountain',
 'US/Pacific',
 'US/Samoa',
 'UTC',
 'Universal',
 'W-SU',
 'WET',
 'Zulu']

Example-1

In [1074]:
df=pd.read_csv('TZH.csv',parse_dates=['Date Time'])
df
Out[1074]:
Date Time Price
0 2018-12-16 09:00:00 72.38
1 2018-12-16 09:15:00 71.0
2 2018-12-16 09:30:00 71.67
3 2018-12-16 10:00:00 72.82
4 2018-12-16 10:30:00 73.0
5 2018-12-16 11:00:00 72.5
In [1075]:
df.set_index('Date Time', inplace = True)
In [1076]:
df
Out[1076]:
Price
Date Time
2018-12-16 09:00:00 72.38
2018-12-16 09:15:00 71.0
2018-12-16 09:30:00 71.67
2018-12-16 10:00:00 72.82
2018-12-16 10:30:00 73.0
2018-12-16 11:00:00 72.5
In [1077]:
df.index
Out[1077]:
DatetimeIndex(['2018-12-16 09:00:00', '2018-12-16 09:15:00',
               '2018-12-16 09:30:00', '2018-12-16 10:00:00',
               '2018-12-16 10:30:00', '2018-12-16 11:00:00'],
              dtype='datetime64[ns]', name='Date Time', freq=None)
In [1078]:
df=df.tz_localize(tz='Asia/Istanbul')
In [1079]:
df
Out[1079]:
Price
Date Time
2018-12-16 09:00:00+03:00 72.38
2018-12-16 09:15:00+03:00 71.0
2018-12-16 09:30:00+03:00 71.67
2018-12-16 10:00:00+03:00 72.82
2018-12-16 10:30:00+03:00 73.0
2018-12-16 11:00:00+03:00 72.5
In [1080]:
df.index
Out[1080]:
DatetimeIndex(['2018-12-16 09:00:00+03:00', '2018-12-16 09:15:00+03:00',
               '2018-12-16 09:30:00+03:00', '2018-12-16 10:00:00+03:00',
               '2018-12-16 10:30:00+03:00', '2018-12-16 11:00:00+03:00'],
              dtype='datetime64[ns, Asia/Istanbul]', name='Date Time', freq=None)

Example-2

In [1081]:
df=df.tz_convert(tz='Asia/Karachi')
df
Out[1081]:
Price
Date Time
2018-12-16 11:00:00+05:00 72.38
2018-12-16 11:15:00+05:00 71.0
2018-12-16 11:30:00+05:00 71.67
2018-12-16 12:00:00+05:00 72.82
2018-12-16 12:30:00+05:00 73.0
2018-12-16 13:00:00+05:00 72.5
In [1082]:
df.index
Out[1082]:
DatetimeIndex(['2018-12-16 11:00:00+05:00', '2018-12-16 11:15:00+05:00',
               '2018-12-16 11:30:00+05:00', '2018-12-16 12:00:00+05:00',
               '2018-12-16 12:30:00+05:00', '2018-12-16 13:00:00+05:00'],
              dtype='datetime64[ns, Asia/Karachi]', name='Date Time', freq=None)
In [1083]:
df=df.tz_convert(tz='dateutil/Europe/Berlin')
df
Out[1083]:
Price
Date Time
2018-12-16 07:00:00+01:00 72.38
2018-12-16 07:15:00+01:00 71.0
2018-12-16 07:30:00+01:00 71.67
2018-12-16 08:00:00+01:00 72.82
2018-12-16 08:30:00+01:00 73.0
2018-12-16 09:00:00+01:00 72.5

Arithmetic Functions

Data Frame-1

In [1084]:
pakistan= pd.read_csv('Karachi.csv',parse_dates=['Date'])
pakistan.head()
Out[1084]:
Date info
0 2018-12-16 00:00:00 0
1 2018-12-16 01:00:00 1
2 2018-12-16 02:00:00 2
3 2018-12-16 03:00:00 3
4 2018-12-16 04:00:00 4
In [1085]:
pakistan.set_index('Date', inplace = True)
In [1086]:
pakistan.head()
Out[1086]:
info
Date
2018-12-16 00:00:00 0
2018-12-16 01:00:00 1
2018-12-16 02:00:00 2
2018-12-16 03:00:00 3
2018-12-16 04:00:00 4
In [1087]:
pakistan.index
Out[1087]:
DatetimeIndex(['2018-12-16 00:00:00', '2018-12-16 01:00:00',
               '2018-12-16 02:00:00', '2018-12-16 03:00:00',
               '2018-12-16 04:00:00', '2018-12-16 05:00:00',
               '2018-12-16 06:00:00', '2018-12-16 07:00:00',
               '2018-12-16 08:00:00', '2018-12-16 09:00:00',
               '2018-12-16 10:00:00', '2018-12-16 11:00:00',
               '2018-12-16 12:00:00', '2018-12-16 13:00:00',
               '2018-12-16 14:00:00', '2018-12-16 15:00:00',
               '2018-12-16 16:00:00', '2018-12-16 17:00:00',
               '2018-12-16 18:00:00', '2018-12-16 19:00:00',
               '2018-12-16 20:00:00', '2018-12-16 21:00:00',
               '2018-12-16 22:00:00'],
              dtype='datetime64[ns]', name='Date', freq=None)
In [1088]:
pakistan=pakistan.tz_localize(tz='Asia/Karachi')
pakistan
Out[1088]:
info
Date
2018-12-16 00:00:00+05:00 0
2018-12-16 01:00:00+05:00 1
2018-12-16 02:00:00+05:00 2
2018-12-16 03:00:00+05:00 3
2018-12-16 04:00:00+05:00 4
2018-12-16 05:00:00+05:00 5
2018-12-16 06:00:00+05:00 6
2018-12-16 07:00:00+05:00 7
2018-12-16 08:00:00+05:00 8
2018-12-16 09:00:00+05:00 9
2018-12-16 10:00:00+05:00 10
2018-12-16 11:00:00+05:00 11
2018-12-16 12:00:00+05:00 12
2018-12-16 13:00:00+05:00 13
2018-12-16 14:00:00+05:00 14
2018-12-16 15:00:00+05:00 15
2018-12-16 16:00:00+05:00 16
2018-12-16 17:00:00+05:00 17
2018-12-16 18:00:00+05:00 18
2018-12-16 19:00:00+05:00 19
2018-12-16 20:00:00+05:00 20
2018-12-16 21:00:00+05:00 21
2018-12-16 22:00:00+05:00 22
In [1089]:
pakistan.index
Out[1089]:
DatetimeIndex(['2018-12-16 00:00:00+05:00', '2018-12-16 01:00:00+05:00',
               '2018-12-16 02:00:00+05:00', '2018-12-16 03:00:00+05:00',
               '2018-12-16 04:00:00+05:00', '2018-12-16 05:00:00+05:00',
               '2018-12-16 06:00:00+05:00', '2018-12-16 07:00:00+05:00',
               '2018-12-16 08:00:00+05:00', '2018-12-16 09:00:00+05:00',
               '2018-12-16 10:00:00+05:00', '2018-12-16 11:00:00+05:00',
               '2018-12-16 12:00:00+05:00', '2018-12-16 13:00:00+05:00',
               '2018-12-16 14:00:00+05:00', '2018-12-16 15:00:00+05:00',
               '2018-12-16 16:00:00+05:00', '2018-12-16 17:00:00+05:00',
               '2018-12-16 18:00:00+05:00', '2018-12-16 19:00:00+05:00',
               '2018-12-16 20:00:00+05:00', '2018-12-16 21:00:00+05:00',
               '2018-12-16 22:00:00+05:00'],
              dtype='datetime64[ns, Asia/Karachi]', name='Date', freq=None)

Data Frame-2

In [1090]:
turki= pd.read_csv('Istanbul.csv',parse_dates=['Date'])
turki.head()
Out[1090]:
Date info
0 2018-12-16 00:00:00 0
1 2018-12-16 01:00:00 1
2 2018-12-16 02:00:00 2
3 2018-12-16 03:00:00 3
4 2018-12-16 04:00:00 4
In [1091]:
turki.set_index('Date', inplace = True)
In [1092]:
turki.head()
Out[1092]:
info
Date
2018-12-16 00:00:00 0
2018-12-16 01:00:00 1
2018-12-16 02:00:00 2
2018-12-16 03:00:00 3
2018-12-16 04:00:00 4
In [1093]:
turki.index
Out[1093]:
DatetimeIndex(['2018-12-16 00:00:00', '2018-12-16 01:00:00',
               '2018-12-16 02:00:00', '2018-12-16 03:00:00',
               '2018-12-16 04:00:00', '2018-12-16 05:00:00',
               '2018-12-16 06:00:00', '2018-12-16 07:00:00',
               '2018-12-16 08:00:00', '2018-12-16 09:00:00',
               '2018-12-16 10:00:00', '2018-12-16 11:00:00',
               '2018-12-16 12:00:00', '2018-12-16 13:00:00',
               '2018-12-16 14:00:00', '2018-12-16 15:00:00',
               '2018-12-16 16:00:00', '2018-12-16 17:00:00',
               '2018-12-16 18:00:00', '2018-12-16 19:00:00',
               '2018-12-16 20:00:00', '2018-12-16 21:00:00',
               '2018-12-16 22:00:00'],
              dtype='datetime64[ns]', name='Date', freq=None)
In [1094]:
turki=turki.tz_localize(tz='Asia/Istanbul')
In [1095]:
turki.index
Out[1095]:
DatetimeIndex(['2018-12-16 00:00:00+03:00', '2018-12-16 01:00:00+03:00',
               '2018-12-16 02:00:00+03:00', '2018-12-16 03:00:00+03:00',
               '2018-12-16 04:00:00+03:00', '2018-12-16 05:00:00+03:00',
               '2018-12-16 06:00:00+03:00', '2018-12-16 07:00:00+03:00',
               '2018-12-16 08:00:00+03:00', '2018-12-16 09:00:00+03:00',
               '2018-12-16 10:00:00+03:00', '2018-12-16 11:00:00+03:00',
               '2018-12-16 12:00:00+03:00', '2018-12-16 13:00:00+03:00',
               '2018-12-16 14:00:00+03:00', '2018-12-16 15:00:00+03:00',
               '2018-12-16 16:00:00+03:00', '2018-12-16 17:00:00+03:00',
               '2018-12-16 18:00:00+03:00', '2018-12-16 19:00:00+03:00',
               '2018-12-16 20:00:00+03:00', '2018-12-16 21:00:00+03:00',
               '2018-12-16 22:00:00+03:00'],
              dtype='datetime64[ns, Asia/Istanbul]', name='Date', freq=None)
In [1096]:
turki
Out[1096]:
info
Date
2018-12-16 00:00:00+03:00 0
2018-12-16 01:00:00+03:00 1
2018-12-16 02:00:00+03:00 2
2018-12-16 03:00:00+03:00 3
2018-12-16 04:00:00+03:00 4
2018-12-16 05:00:00+03:00 5
2018-12-16 06:00:00+03:00 6
2018-12-16 07:00:00+03:00 7
2018-12-16 08:00:00+03:00 8
2018-12-16 09:00:00+03:00 9
2018-12-16 10:00:00+03:00 10
2018-12-16 11:00:00+03:00 11
2018-12-16 12:00:00+03:00 12
2018-12-16 13:00:00+03:00 13
2018-12-16 14:00:00+03:00 14
2018-12-16 15:00:00+03:00 15
2018-12-16 16:00:00+03:00 16
2018-12-16 17:00:00+03:00 17
2018-12-16 18:00:00+03:00 18
2018-12-16 19:00:00+03:00 19
2018-12-16 20:00:00+03:00 20
2018-12-16 21:00:00+03:00 21
2018-12-16 22:00:00+03:00 22
In [1097]:
f_res=pakistan+turki
f_res
Out[1097]:
info
Date
2018-12-15 19:00:00+00:00 nan
2018-12-15 20:00:00+00:00 nan
2018-12-15 21:00:00+00:00 2.0
2018-12-15 22:00:00+00:00 4.0
2018-12-15 23:00:00+00:00 6.0
2018-12-16 00:00:00+00:00 8.0
2018-12-16 01:00:00+00:00 10.0
2018-12-16 02:00:00+00:00 12.0
2018-12-16 03:00:00+00:00 14.0
2018-12-16 04:00:00+00:00 16.0
2018-12-16 05:00:00+00:00 18.0
2018-12-16 06:00:00+00:00 20.0
2018-12-16 07:00:00+00:00 22.0
2018-12-16 08:00:00+00:00 24.0
2018-12-16 09:00:00+00:00 26.0
2018-12-16 10:00:00+00:00 28.0
2018-12-16 11:00:00+00:00 30.0
2018-12-16 12:00:00+00:00 32.0
2018-12-16 13:00:00+00:00 34.0
2018-12-16 14:00:00+00:00 36.0
2018-12-16 15:00:00+00:00 38.0
2018-12-16 16:00:00+00:00 40.0
2018-12-16 17:00:00+00:00 42.0
2018-12-16 18:00:00+00:00 nan
2018-12-16 19:00:00+00:00 nan

tshift

  • Shift the time index, using the index’s frequency if available.
  • More Information
In [1098]:
df= pd.read_csv('egg_price.csv',parse_dates=['Date'],index_col='Date')
df
Out[1098]:
Price
Date
2018-12-03 106
2018-12-04 120
2018-12-05 100
2018-12-06 100
2018-12-07 120
2018-12-10 116
2018-12-11 125
2018-12-12 106
2018-12-13 112
2018-12-14 128
In [1099]:
df.tshift(1)
Out[1099]:
Price
Date
2018-12-04 106
2018-12-05 120
2018-12-06 100
2018-12-07 100
2018-12-10 120
2018-12-11 116
2018-12-12 125
2018-12-13 106
2018-12-14 112
2018-12-17 128
In [1100]:
df.tshift(-1)
Out[1100]:
Price
Date
2018-11-30 106
2018-12-03 120
2018-12-04 100
2018-12-05 100
2018-12-06 120
2018-12-07 116
2018-12-10 125
2018-12-11 106
2018-12-12 112
2018-12-13 128

first

  • first: Convenience method for subsetting initial periods of time series data based on a date offset.
  • More Information
In [1101]:
i = pd.date_range('2018-04-09', periods=4, freq='2D')
ts = pd.DataFrame({'A': [1,2,3,4]}, index=i)
ts
Out[1101]:
A
2018-04-09 1
2018-04-11 2
2018-04-13 3
2018-04-15 4
  • Get the rows for the first 3 days:
In [1102]:
ts.first('3D')
Out[1102]:
A
2018-04-09 1
2018-04-11 2

Notice the data for 3 first calender days were returned, not the first 3 days observed in the dataset, and therefore data for 2018-04-13 was not returned.

last

  • last: Convenience method for subsetting final periods of time series data based on a date offset.
  • More Information
In [1103]:
i = pd.date_range('2018-04-09', periods=4, freq='2D')
ts = pd.DataFrame({'A': [1,2,3,4]}, index=i)
ts
Out[1103]:
A
2018-04-09 1
2018-04-11 2
2018-04-13 3
2018-04-15 4
In [1104]:
ts.last('3D')
Out[1104]:
A
2018-04-13 3
2018-04-15 4
  • Notice the data for 3 last calender days were returned, not the last 3 observed days in the dataset, and therefore data for 2018-04-11 was not returned.

between_time

Example-1

In [1105]:
df= pd.read_csv('Cell_H_O.csv',index_col="Time",parse_dates=["Time"])
df.head()
Out[1105]:
Date Cell CI Cell Name CellIndex Site Name GBSC Location Region City Integrity ... _HSR%_N _CSSR_Call Drops on SDCCH_N_2 _CSSR_Successful SDCCH Seizures_D_2 _CSSR_Successful TCH Seizures (Traffic Channel)_N_3 _CSSR_TCH Seizure Requests (Traffic Channel)_D_3 _Cssr_Channel Requests (Circuit Service)_D_1 _cssr_Call Setup Indications (Circuit Service)_N_1 Interference Band Proportion (4~5)(%) Interference Band 345 R3560:Maximum Number of Busy Channels (SDCCH)
Time
2019-10-11 8/2/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 78 0 312 42 42 180 179 0.0 0.131 5
2019-10-11 8/2/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 73 0 393 79 86 279 279 0.0 0.0 4
2019-10-11 8/2/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 95 0 536 37 38 404 398 0.051 0.051 5
2019-10-11 8/2/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 67 4 3206 121 122 1433 1430 0.1256 0.4216 12
2019-10-11 8/2/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 49 1 731 103 103 581 578 0.0 0.0294 6

5 rows × 225 columns

In [1106]:
df.between_time('6:00','8:00')
Out[1106]:
Date Cell CI Cell Name CellIndex Site Name GBSC Location Region City Integrity ... _HSR%_N _CSSR_Call Drops on SDCCH_N_2 _CSSR_Successful SDCCH Seizures_D_2 _CSSR_Successful TCH Seizures (Traffic Channel)_N_3 _CSSR_TCH Seizure Requests (Traffic Channel)_D_3 _Cssr_Channel Requests (Circuit Service)_D_1 _cssr_Call Setup Indications (Circuit Service)_N_1 Interference Band Proportion (4~5)(%) Interference Band 345 R3560:Maximum Number of Busy Channels (SDCCH)
Time
2019-10-11 06:00:00 8/2/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 12 0 277 40 41 142 138 0.0 0.0115 5
2019-10-11 06:00:00 8/2/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 6 0 217 10 10 50 50 0.0406 0.0541 3
2019-10-11 06:00:00 8/2/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 21 0 1702 31 31 482 481 0.0154 0.1623 10
2019-10-11 06:00:00 8/2/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 34 0 364 36 37 234 234 0.0 0.043 4
2019-10-11 06:00:00 8/2/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 50 0 183 34 34 63 63 0.0131 0.0785 3
2019-10-11 06:00:00 8/2/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 28 0 266 25 25 113 113 0.0 0.0 4
2019-10-11 07:00:00 8/2/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 53 0 320 56 57 172 172 0.0 0.0239 3
2019-10-11 07:00:00 8/2/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 76 0 267 55 55 102 101 0.0 0.1763 3
2019-10-11 07:00:00 8/2/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 17 0 296 28 28 138 138 0.0 0.0 4
2019-10-11 07:00:00 8/2/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 70 0 521 112 112 357 356 0.0 0.0375 5
2019-10-11 07:00:00 8/2/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 16 0 1614 53 56 205 200 0.0 0.1195 8
2019-10-11 07:00:00 8/2/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 53 0 296 47 47 157 152 0.0 0.0 3
2019-10-11 08:00:00 8/2/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 115 0 681 162 162 488 487 0.0 0.0387 6
2019-10-11 08:00:00 8/2/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 36 0 261 25 25 133 132 0.0 0.0 5
2019-10-11 08:00:00 8/2/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 146 0 468 116 117 305 303 0.0 0.0163 4
2019-10-11 08:00:00 8/2/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 108 0 531 134 138 368 365 0.0 0.013999999999999999 5
2019-10-11 08:00:00 8/2/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 82 0 2337 142 144 482 478 0.10400000000000001 0.6844 11
2019-10-11 08:00:00 8/2/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 122 0 555 200 200 387 384 0.2003 1.0782 5
2019-10-11 06:00:00 8/3/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 54 1 161 11 11 57 57 0.0 0.1095 6
2019-10-11 06:00:00 8/3/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 18 0 2270 31 32 975 975 0.0393 0.2438 10
2019-10-11 06:00:00 8/3/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 25 0 306 53 54 151 151 0.0116 0.0116 3
2019-10-11 06:00:00 8/3/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 32 0 332 54 55 172 172 0.0 0.0225 4
2019-10-11 06:00:00 8/3/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 18 0 258 36 37 91 86 0.0 0.1495 4
2019-10-11 06:00:00 8/3/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 17 0 305 44 44 186 186 0.0142 0.0497 4
2019-10-11 07:00:00 8/3/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 54 0 299 90 91 151 151 0.5338 6.1257 3
2019-10-11 07:00:00 8/3/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 5 0 160 16 16 77 77 0.0 0.0135 3
2019-10-11 07:00:00 8/3/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 76 0 352 78 84 146 144 0.0 0.0 3
2019-10-11 07:00:00 8/3/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 64 0 343 81 81 174 174 0.0293 0.381 4
2019-10-11 07:00:00 8/3/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 45 1 2400 86 87 1119 1115 0.1146 0.2865 9
2019-10-11 07:00:00 8/3/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 79 0 334 74 74 203 200 0.9484 2.4017 4
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2019-10-11 07:00:00 8/8/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 39 0 1950 103 104 567 564 0.1858 0.3274 10
2019-10-11 07:00:00 8/8/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 0 0 176 4 4 99 99 0.0 0.1069 3
2019-10-11 07:00:00 8/8/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 98 0 538 132 132 361 358 0.0152 0.1213 6
2019-10-11 07:00:00 8/8/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 44 0 359 73 73 197 195 0.0 0.0 5
2019-10-11 07:00:00 8/8/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 47 0 360 108 108 224 224 0.0 0.1281 5
2019-10-11 07:00:00 8/8/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 90 0 288 75 76 160 156 0.0 0.0 4
2019-10-11 08:00:00 8/8/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 32 0 365 32 32 247 247 0.0 0.0 8
2019-10-11 08:00:00 8/8/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 99 3 2152 131 131 395 386 0.1189 0.5211 10
2019-10-11 08:00:00 8/8/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 225 0 521 147 148 341 340 0.0 0.3065 6
2019-10-11 08:00:00 8/8/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 164 1 874 209 210 667 667 0.051 0.1361 6
2019-10-11 08:00:00 8/8/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 116 0 475 135 135 321 318 0.0 0.1134 5
2019-10-11 08:00:00 8/8/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 179 0 393 102 103 232 232 0.0 0.0134 5
2019-10-11 06:00:00 8/9/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 26 0 247 24 24 155 154 0.0 0.0 9
2019-10-11 06:00:00 8/9/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 16 0 321 49 50 149 149 0.0 0.0 5
2019-10-11 06:00:00 8/9/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 26 1 1256 28 28 65 65 0.0242 0.5166 7
2019-10-11 06:00:00 8/9/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 28 0 260 41 41 140 139 0.0222 0.0296 3
2019-10-11 06:00:00 8/9/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 22 0 305 51 52 185 185 0.0 0.0222 4
2019-10-11 06:00:00 8/9/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 40 0 260 49 50 141 141 0.0 0.7169 3
2019-10-11 07:00:00 8/9/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 56 0 362 103 103 213 208 0.0132 0.1453 4
2019-10-11 07:00:00 8/9/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 128 0 499 147 147 368 366 0.0386 0.2084 4
2019-10-11 07:00:00 8/9/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 54 0 386 58 58 207 207 0.0 0.0 5
2019-10-11 07:00:00 8/9/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 39 0 224 20 20 133 133 0.0 0.0 5
2019-10-11 07:00:00 8/9/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 32 1 1458 87 87 249 235 0.0932 0.8557 8
2019-10-11 07:00:00 8/9/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 63 0 320 80 80 194 193 0.0 0.0321 3
2019-10-11 08:00:00 8/9/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 93 0 462 128 128 360 358 0.0082 0.0733 5
2019-10-11 08:00:00 8/9/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 218 0 706 203 206 549 549 0.0438 0.2366 5
2019-10-11 08:00:00 8/9/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 51 5 2301 107 109 408 408 0.3858 1.5935 11
2019-10-11 08:00:00 8/9/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 127 1 544 105 106 305 304 0.0 0.0143 5
2019-10-11 08:00:00 8/9/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 125 0 519 135 135 382 379 0.315 1.0453 4
2019-10-11 08:00:00 8/9/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 42 0 378 44 44 251 251 0.0 0.0556 5

144 rows × 225 columns

Example-2

In [1107]:
df= pd.read_csv('Cell_H.csv',index_col="Time",parse_dates=["Time"])
df.head()
Out[1107]:
Cell CI Cell Name CellIndex Site Name GBSC Location Region City Integrity CSSR (Excl Loc Update)(%) ... _HSR%_N _CSSR_Call Drops on SDCCH_N_2 _CSSR_Successful SDCCH Seizures_D_2 _CSSR_Successful TCH Seizures (Traffic Channel)_N_3 _CSSR_TCH Seizure Requests (Traffic Channel)_D_3 _Cssr_Channel Requests (Circuit Service)_D_1 _cssr_Call Setup Indications (Circuit Service)_N_1 Interference Band Proportion (4~5)(%) Interference Band 345 R3560:Maximum Number of Busy Channels (SDCCH)
Time
2019-08-02 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 99.4444 ... 78 0 312 42 42 180 179 0.0 0.131 5
2019-08-02 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 98.8492 ... 67 4 3206 121 122 1433 1430 0.1256 0.4216 12
2019-08-02 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 95.9224 ... 95 0 536 37 38 404 398 0.051 0.051 5
2019-08-02 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.6296 ... 74 0 426 48 48 270 269 0.0 0.0 4
2019-08-02 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.3476 ... 49 1 731 103 103 581 578 0.0 0.0294 6

5 rows × 224 columns

In [1108]:
df.between_time('6:00','8:00')
Out[1108]:
Cell CI Cell Name CellIndex Site Name GBSC Location Region City Integrity CSSR (Excl Loc Update)(%) ... _HSR%_N _CSSR_Call Drops on SDCCH_N_2 _CSSR_Successful SDCCH Seizures_D_2 _CSSR_Successful TCH Seizures (Traffic Channel)_N_3 _CSSR_TCH Seizure Requests (Traffic Channel)_D_3 _Cssr_Channel Requests (Circuit Service)_D_1 _cssr_Call Setup Indications (Circuit Service)_N_1 Interference Band Proportion (4~5)(%) Interference Band 345 R3560:Maximum Number of Busy Channels (SDCCH)
Time
2019-08-02 06:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 94.8128 ... 12 0 277 40 41 142 138 0.0 0.0115 5
2019-08-02 06:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 100.0 ... 6 0 217 10 10 50 50 0.0406 0.0541 3
2019-08-02 06:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 99.7925 ... 21 0 1702 31 31 482 481 0.0154 0.1623 10
2019-08-02 06:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 28 0 266 25 25 113 113 0.0 0.0 4
2019-08-02 06:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 97.2973 ... 34 0 364 36 37 234 234 0.0 0.043 4
2019-08-02 06:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 50 0 183 34 34 63 63 0.0131 0.0785 3
2019-08-02 07:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.2456 ... 53 0 320 56 57 172 172 0.0 0.0239 3
2019-08-02 07:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 100.0 ... 17 0 296 28 28 138 138 0.0 0.0 4
2019-08-02 07:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.7199 ... 70 0 521 112 112 357 356 0.0 0.0375 5
2019-08-02 07:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 96.8153 ... 53 0 296 47 47 157 152 0.0 0.0 3
2019-08-02 07:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.0196 ... 76 0 267 55 55 102 101 0.0 0.1763 3
2019-08-02 07:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 92.3345 ... 16 0 1614 53 56 205 200 0.0 0.1195 8
2019-08-02 08:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 96.3099 ... 108 0 531 134 138 368 365 0.0 0.013999999999999999 5
2019-08-02 08:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.4952 ... 146 0 468 116 117 305 303 0.0 0.0163 4
2019-08-02 08:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.7951 ... 115 0 681 162 162 488 487 0.0 0.0387 6
2019-08-02 08:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 99.2481 ... 36 0 261 25 25 133 132 0.0 0.0 5
2019-08-02 08:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.2248 ... 122 0 555 200 200 387 384 0.2003 1.0782 5
2019-08-02 08:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 97.7928 ... 82 0 2337 142 144 482 478 0.10400000000000001 0.6844 11
2019-08-03 06:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 91.9513 ... 18 0 258 36 37 91 86 0.0 0.1495 4
2019-08-03 06:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.1818 ... 32 0 332 54 55 172 172 0.0 0.0225 4
2019-08-03 06:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 96.875 ... 18 0 2270 31 32 975 975 0.0393 0.2438 10
2019-08-03 06:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 99.3789 ... 54 1 161 11 11 57 57 0.0 0.1095 6
2019-08-03 06:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.1481 ... 25 0 306 53 54 151 151 0.0116 0.0116 3
2019-08-03 06:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 17 0 305 44 44 186 186 0.0142 0.0497 4
2019-08-03 07:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 98.4562 ... 45 1 2400 86 87 1119 1115 0.1146 0.2865 9
2019-08-03 07:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.5222 ... 79 0 334 74 74 203 200 0.9484 2.4017 4
2019-08-03 07:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 100.0 ... 5 0 160 16 16 77 77 0.0 0.0135 3
2019-08-03 07:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 91.5851 ... 76 0 352 78 84 146 144 0.0 0.0 3
2019-08-03 07:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.9011 ... 54 0 299 90 91 151 151 0.5338 6.1257 3
2019-08-03 07:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 64 0 343 81 81 174 174 0.0293 0.381 4
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2019-08-08 07:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.9848 ... 44 0 359 73 73 197 195 0.0 0.0 5
2019-08-08 07:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 47 0 360 108 108 224 224 0.0 0.1281 5
2019-08-08 07:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 98.5144 ... 39 0 1950 103 104 567 564 0.1858 0.3274 10
2019-08-08 07:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 96.2171 ... 90 0 288 75 76 160 156 0.0 0.0 4
2019-08-08 07:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 100.0 ... 0 0 176 4 4 99 99 0.0 0.1069 3
2019-08-08 07:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.169 ... 98 0 538 132 132 361 358 0.0152 0.1213 6
2019-08-08 08:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 100.0 ... 32 0 365 32 32 247 247 0.0 0.0 8
2019-08-08 08:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 97.5853 ... 99 3 2152 131 131 395 386 0.1189 0.5211 10
2019-08-08 08:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.0291 ... 179 0 393 102 103 232 232 0.0 0.0134 5
2019-08-08 08:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.4099 ... 164 1 874 209 210 667 667 0.051 0.1361 6
2019-08-08 08:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.0654 ... 116 0 475 135 135 321 318 0.0 0.1134 5
2019-08-08 08:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.0331 ... 225 0 521 147 148 341 340 0.0 0.3065 6
2019-08-09 06:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 99.9204 ... 26 1 1256 28 28 65 65 0.0242 0.5166 7
2019-08-09 06:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 99.3548 ... 26 0 247 24 24 155 154 0.0 0.0 9
2019-08-09 06:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.0769 ... 22 0 305 51 52 185 185 0.0 0.0222 4
2019-08-09 06:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.0 ... 16 0 321 49 50 149 149 0.0 0.0 5
2019-08-09 06:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.0 ... 40 0 260 49 50 141 141 0.0 0.7169 3
2019-08-09 06:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.2857 ... 28 0 260 41 41 140 139 0.0222 0.0296 3
2019-08-09 07:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 97.6526 ... 56 0 362 103 103 213 208 0.0132 0.1453 4
2019-08-09 07:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.4845 ... 63 0 320 80 80 194 193 0.0 0.0321 3
2019-08-09 07:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 100.0 ... 39 0 224 20 20 133 133 0.0 0.0 5
2019-08-09 07:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 94.3128 ... 32 1 1458 87 87 249 235 0.0932 0.8557 8
2019-08-09 07:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.4565 ... 128 0 499 147 147 368 366 0.0386 0.2084 4
2019-08-09 07:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 54 0 386 58 58 207 207 0.0 0.0 5
2019-08-09 08:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.2147 ... 125 0 519 135 135 382 379 0.315 1.0453 4
2019-08-09 08:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 97.9518 ... 51 5 2301 107 109 408 408 0.3858 1.5935 11
2019-08-09 08:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.4444 ... 93 0 462 128 128 360 358 0.0082 0.0733 5
2019-08-09 08:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 100.0 ... 42 0 378 44 44 251 251 0.0 0.0556 5
2019-08-09 08:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.5503 ... 127 1 544 105 106 305 304 0.0 0.0143 5
2019-08-09 08:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.5437 ... 218 0 706 203 206 549 549 0.0438 0.2366 5

144 rows × 224 columns

at_time

Example-1

In [1109]:
df= pd.read_csv('Cell_H_O.csv',index_col="Time",parse_dates=["Time"])
df.head()
Out[1109]:
Date Cell CI Cell Name CellIndex Site Name GBSC Location Region City Integrity ... _HSR%_N _CSSR_Call Drops on SDCCH_N_2 _CSSR_Successful SDCCH Seizures_D_2 _CSSR_Successful TCH Seizures (Traffic Channel)_N_3 _CSSR_TCH Seizure Requests (Traffic Channel)_D_3 _Cssr_Channel Requests (Circuit Service)_D_1 _cssr_Call Setup Indications (Circuit Service)_N_1 Interference Band Proportion (4~5)(%) Interference Band 345 R3560:Maximum Number of Busy Channels (SDCCH)
Time
2019-10-11 8/2/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 78 0 312 42 42 180 179 0.0 0.131 5
2019-10-11 8/2/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 73 0 393 79 86 279 279 0.0 0.0 4
2019-10-11 8/2/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 95 0 536 37 38 404 398 0.051 0.051 5
2019-10-11 8/2/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 67 4 3206 121 122 1433 1430 0.1256 0.4216 12
2019-10-11 8/2/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 49 1 731 103 103 581 578 0.0 0.0294 6

5 rows × 225 columns

In [1110]:
df.at_time('6:00')
Out[1110]:
Date Cell CI Cell Name CellIndex Site Name GBSC Location Region City Integrity ... _HSR%_N _CSSR_Call Drops on SDCCH_N_2 _CSSR_Successful SDCCH Seizures_D_2 _CSSR_Successful TCH Seizures (Traffic Channel)_N_3 _CSSR_TCH Seizure Requests (Traffic Channel)_D_3 _Cssr_Channel Requests (Circuit Service)_D_1 _cssr_Call Setup Indications (Circuit Service)_N_1 Interference Band Proportion (4~5)(%) Interference Band 345 R3560:Maximum Number of Busy Channels (SDCCH)
Time
2019-10-11 06:00:00 8/2/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 12 0 277 40 41 142 138 0.0 0.0115 5
2019-10-11 06:00:00 8/2/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 6 0 217 10 10 50 50 0.0406 0.0541 3
2019-10-11 06:00:00 8/2/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 21 0 1702 31 31 482 481 0.0154 0.1623 10
2019-10-11 06:00:00 8/2/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 34 0 364 36 37 234 234 0.0 0.043 4
2019-10-11 06:00:00 8/2/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 50 0 183 34 34 63 63 0.0131 0.0785 3
2019-10-11 06:00:00 8/2/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 28 0 266 25 25 113 113 0.0 0.0 4
2019-10-11 06:00:00 8/3/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 54 1 161 11 11 57 57 0.0 0.1095 6
2019-10-11 06:00:00 8/3/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 18 0 2270 31 32 975 975 0.0393 0.2438 10
2019-10-11 06:00:00 8/3/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 25 0 306 53 54 151 151 0.0116 0.0116 3
2019-10-11 06:00:00 8/3/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 32 0 332 54 55 172 172 0.0 0.0225 4
2019-10-11 06:00:00 8/3/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 18 0 258 36 37 91 86 0.0 0.1495 4
2019-10-11 06:00:00 8/3/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 17 0 305 44 44 186 186 0.0142 0.0497 4
2019-10-11 06:00:00 8/4/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 35 0 388 46 46 190 190 0.0 0.0368 4
2019-10-11 06:00:00 8/4/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 48 0 346 58 58 218 218 0.0 0.0144 4
2019-10-11 06:00:00 8/4/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 14 0 306 8 10 128 128 0.0 0.0276 7
2019-10-11 06:00:00 8/4/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 26 2 2031 36 36 991 990 0.0641 0.2163 10
2019-10-11 06:00:00 8/4/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 7 0 294 23 23 137 136 0.0 0.0 4
2019-10-11 06:00:00 8/4/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 23 0 274 75 75 109 109 0.0 0.1508 6
2019-10-11 06:00:00 8/5/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 19 0 239 63 63 113 113 0.1246 0.3987 3
2019-10-11 06:00:00 8/5/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 22 0 1175 51 51 129 125 0.0315 0.1651 8
2019-10-11 06:00:00 8/5/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 48 0 279 55 55 177 177 0.0074 0.0223 3
2019-10-11 06:00:00 8/5/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 10 0 263 26 26 120 120 0.0 0.0 3
2019-10-11 06:00:00 8/5/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 27 0 401 51 51 185 185 0.0 0.0 4
2019-10-11 06:00:00 8/5/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 7 0 250 5 5 133 133 0.0 0.0 4
2019-10-11 06:00:00 8/6/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 7 0 245 17 17 133 133 0.0 0.0 3
2019-10-11 06:00:00 8/6/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 25 0 357 42 42 210 209 0.0 0.0498 4
2019-10-11 06:00:00 8/6/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 14 2 2145 33 33 995 990 0.016 0.4796 8
2019-10-11 06:00:00 8/6/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 27 0 288 34 34 165 162 0.0 0.0 3
2019-10-11 06:00:00 8/6/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 16 0 367 38 38 132 132 0.0 0.0451 5
2019-10-11 06:00:00 8/6/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 29 0 252 41 41 100 98 0.0 0.0 4
2019-10-11 06:00:00 8/7/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 3 0 265 8 8 104 104 0.0 0.0138 4
2019-10-11 06:00:00 8/7/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 11 0 2248 54 54 1058 1058 0.0241 0.1287 9
2019-10-11 06:00:00 8/7/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 25 0 249 34 34 127 127 0.0 0.0 4
2019-10-11 06:00:00 8/7/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 34 0 353 56 57 243 243 0.0 0.0979 5
2019-10-11 06:00:00 8/7/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 40 0 299 28 28 149 147 0.0 0.0072 4
2019-10-11 06:00:00 8/7/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 54 0 220 45 45 99 94 0.0 0.5479 4
2019-10-11 06:00:00 8/8/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 23 0 269 45 45 100 99 0.0393 0.3533 3
2019-10-11 06:00:00 8/8/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 32 0 230 49 49 110 109 0.0 0.0 4
2019-10-11 06:00:00 8/8/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 23 0 306 32 32 217 216 0.0284 0.064 4
2019-10-11 06:00:00 8/8/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 31 0 311 29 30 138 138 0.0074 0.0074 4
2019-10-11 06:00:00 8/8/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 4 0 298 8 8 191 190 0.0136 0.0817 7
2019-10-11 06:00:00 8/8/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 19 2 2121 44 44 999 999 0.0169 0.1948 8
2019-10-11 06:00:00 8/9/2019 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 26 0 247 24 24 155 154 0.0 0.0 9
2019-10-11 06:00:00 8/9/2019 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 16 0 321 49 50 149 149 0.0 0.0 5
2019-10-11 06:00:00 8/9/2019 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% ... 26 1 1256 28 28 65 65 0.0242 0.5166 7
2019-10-11 06:00:00 8/9/2019 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 28 0 260 41 41 140 139 0.0222 0.0296 3
2019-10-11 06:00:00 8/9/2019 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 22 0 305 51 52 185 185 0.0 0.0222 4
2019-10-11 06:00:00 8/9/2019 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% ... 40 0 260 49 50 141 141 0.0 0.7169 3

48 rows × 225 columns

Example-2

In [1111]:
df= pd.read_csv('Cell_H.csv',index_col="Time",parse_dates=["Time"])
df.head()
Out[1111]:
Cell CI Cell Name CellIndex Site Name GBSC Location Region City Integrity CSSR (Excl Loc Update)(%) ... _HSR%_N _CSSR_Call Drops on SDCCH_N_2 _CSSR_Successful SDCCH Seizures_D_2 _CSSR_Successful TCH Seizures (Traffic Channel)_N_3 _CSSR_TCH Seizure Requests (Traffic Channel)_D_3 _Cssr_Channel Requests (Circuit Service)_D_1 _cssr_Call Setup Indications (Circuit Service)_N_1 Interference Band Proportion (4~5)(%) Interference Band 345 R3560:Maximum Number of Busy Channels (SDCCH)
Time
2019-08-02 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 99.4444 ... 78 0 312 42 42 180 179 0.0 0.131 5
2019-08-02 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 98.8492 ... 67 4 3206 121 122 1433 1430 0.1256 0.4216 12
2019-08-02 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 95.9224 ... 95 0 536 37 38 404 398 0.051 0.051 5
2019-08-02 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.6296 ... 74 0 426 48 48 270 269 0.0 0.0 4
2019-08-02 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.3476 ... 49 1 731 103 103 581 578 0.0 0.0294 6

5 rows × 224 columns

In [1112]:
df.at_time('6:00')
Out[1112]:
Cell CI Cell Name CellIndex Site Name GBSC Location Region City Integrity CSSR (Excl Loc Update)(%) ... _HSR%_N _CSSR_Call Drops on SDCCH_N_2 _CSSR_Successful SDCCH Seizures_D_2 _CSSR_Successful TCH Seizures (Traffic Channel)_N_3 _CSSR_TCH Seizure Requests (Traffic Channel)_D_3 _Cssr_Channel Requests (Circuit Service)_D_1 _cssr_Call Setup Indications (Circuit Service)_N_1 Interference Band Proportion (4~5)(%) Interference Band 345 R3560:Maximum Number of Busy Channels (SDCCH)
Time
2019-08-02 06:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 94.8128 ... 12 0 277 40 41 142 138 0.0 0.0115 5
2019-08-02 06:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 100.0 ... 6 0 217 10 10 50 50 0.0406 0.0541 3
2019-08-02 06:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 99.7925 ... 21 0 1702 31 31 482 481 0.0154 0.1623 10
2019-08-02 06:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 28 0 266 25 25 113 113 0.0 0.0 4
2019-08-02 06:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 97.2973 ... 34 0 364 36 37 234 234 0.0 0.043 4
2019-08-02 06:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 50 0 183 34 34 63 63 0.0131 0.0785 3
2019-08-03 06:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 91.9513 ... 18 0 258 36 37 91 86 0.0 0.1495 4
2019-08-03 06:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.1818 ... 32 0 332 54 55 172 172 0.0 0.0225 4
2019-08-03 06:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 96.875 ... 18 0 2270 31 32 975 975 0.0393 0.2438 10
2019-08-03 06:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 99.3789 ... 54 1 161 11 11 57 57 0.0 0.1095 6
2019-08-03 06:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.1481 ... 25 0 306 53 54 151 151 0.0116 0.0116 3
2019-08-03 06:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 17 0 305 44 44 186 186 0.0142 0.0497 4
2019-08-04 06:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.2701 ... 7 0 294 23 23 137 136 0.0 0.0 4
2019-08-04 06:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 35 0 388 46 46 190 190 0.0 0.0368 4
2019-08-04 06:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 99.8007 ... 26 2 2031 36 36 991 990 0.0641 0.2163 10
2019-08-04 06:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 48 0 346 58 58 218 218 0.0 0.0144 4
2019-08-04 06:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 80.0 ... 14 0 306 8 10 128 128 0.0 0.0276 7
2019-08-04 06:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 23 0 274 75 75 109 109 0.0 0.1508 6
2019-08-05 06:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 100.0 ... 7 0 250 5 5 133 133 0.0 0.0 4
2019-08-05 06:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 96.8992 ... 22 0 1175 51 51 129 125 0.0315 0.1651 8
2019-08-05 06:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 19 0 239 63 63 113 113 0.1246 0.3987 3
2019-08-05 06:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 27 0 401 51 51 185 185 0.0 0.0 4
2019-08-05 06:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 48 0 279 55 55 177 177 0.0074 0.0223 3
2019-08-05 06:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 10 0 263 26 26 120 120 0.0 0.0 3
2019-08-06 06:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 16 0 367 38 38 132 132 0.0 0.0451 5
2019-08-06 06:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 100.0 ... 7 0 245 17 17 133 133 0.0 0.0 3
2019-08-06 06:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.1818 ... 27 0 288 34 34 165 162 0.0 0.0 3
2019-08-06 06:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.0 ... 29 0 252 41 41 100 98 0.0 0.0 4
2019-08-06 06:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 99.4047 ... 14 2 2145 33 33 995 990 0.016 0.4796 8
2019-08-06 06:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.5238 ... 25 0 357 42 42 210 209 0.0 0.0498 4
2019-08-07 06:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.6577 ... 40 0 299 28 28 149 147 0.0 0.0072 4
2019-08-07 06:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 100.0 ... 25 0 249 34 34 127 127 0.0 0.0 4
2019-08-07 06:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 100.0 ... 3 0 265 8 8 104 104 0.0 0.0138 4
2019-08-07 06:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.2456 ... 34 0 353 56 57 243 243 0.0 0.0979 5
2019-08-07 06:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 100.0 ... 11 0 2248 54 54 1058 1058 0.0241 0.1287 9
2019-08-07 06:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 94.9495 ... 54 0 220 45 45 99 94 0.0 0.5479 4
2019-08-08 06:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.0 ... 23 0 269 45 45 100 99 0.0393 0.3533 3
2019-08-08 06:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 99.9057 ... 19 2 2121 44 44 999 999 0.0169 0.1948 8
2019-08-08 06:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 96.6667 ... 31 0 311 29 30 138 138 0.0074 0.0074 4
2019-08-08 06:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.0909 ... 32 0 230 49 49 110 109 0.0 0.0 4
2019-08-08 06:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.5392 ... 23 0 306 32 32 217 216 0.0284 0.064 4
2019-08-08 06:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 99.4764 ... 4 0 298 8 8 191 190 0.0136 0.0817 7
2019-08-09 06:00:00 23010 23010_PTCL Exchange Baghban (Gold) Lahore-Z1 (... 103 3010_PTCL Exchange Baghban (Gold) Lahore-Z1 (3... HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 99.9204 ... 26 1 1256 28 28 65 65 0.0242 0.5166 7
2019-08-09 06:00:00 23041 23041_Shad Bagh Lahore-Z1 (3G-CI-4199) 90 3041_Shad Bagh Lahore-Z1 (3G-CI-4199) HLHRBSC08 LAHORE_CLUSTER_03_Urban CENTER01 LAHORE_City 100% 99.3548 ... 26 0 247 24 24 155 154 0.0 0.0 9
2019-08-09 06:00:00 13007 13007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI... 83 3007_Rana Ice Factory (Gold) Lahore-Z1 (3G-CI-... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.0769 ... 22 0 305 51 52 185 185 0.0 0.0222 4
2019-08-09 06:00:00 23114 23114_Imamia Colony Shahdara Sheikhupura (3G-C... 24 3114_Imamia Colony Shahdara Sheikhupura (3G-CI... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.0 ... 16 0 321 49 50 149 149 0.0 0.0 5
2019-08-09 06:00:00 13107 13107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 172 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 98.0 ... 40 0 260 49 50 141 141 0.0 0.7169 3
2019-08-09 06:00:00 33107 33107_Shahdra Machis Factory Lahore-Z1 (3G-CI-... 174 3107_Shahdra Machis Factory Lahore-Z1 (3G-CI-4... HLHRBSC08 LAHORE_CLUSTER_01_Urban CENTER01 SHAHDARA_City 100% 99.2857 ... 28 0 260 41 41 140 139 0.0222 0.0296 3

48 rows × 224 columns

More of your pandas questions answered

In [1113]:
from IPython.display import YouTubeVideo
YouTubeVideo('oH3wYKvwpJ8',width=900, height=500)
Out[1113]:

MultiIndex in pandas

In [1114]:
from IPython.display import YouTubeVideo
YouTubeVideo('tcRGa2soc-c',width=900, height=500)
Out[1114]:

new time-saving tricks in pandas

In [1115]:
from IPython.display import YouTubeVideo
YouTubeVideo('-NbY7E9hKxk',width=900, height=500)
Out[1115]:
In [1116]:
from IPython.display import YouTubeVideo
YouTubeVideo('te5JrSCW-LY',width=900, height=500)
Out[1116]:
In [1117]:
from IPython.display import YouTubeVideo
YouTubeVideo('CWRKgBtZN18',width=900, height=500)
Out[1117]:
In [1118]:
from IPython.display import YouTubeVideo
YouTubeVideo('RlIiVeig3hc',width=900, height=500)
Out[1118]:

Shifting and Lagging

In [1119]:
from IPython.display import YouTubeVideo
YouTubeVideo('0lsmdNLNorY',width=900, height=500)
Out[1119]:

get

  • More Information
  • Extract element from each component at specified position.
  • Extract element from lists, tuples, or strings in each element in the Series/Index.
In [1120]:
df_who= pd.read_csv('WHO_csv.csv',dtype = {"Region" : "category"})
df_who.head()
Out[1120]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.4 60 98.5 54.26 nan 1,140.0 nan nan
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 nan 8,820.0 nan nan
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 nan 8,310.0 98.2 96.4
3 Andorra Europe 78 15.2 22.86 nan 82 3.2 75.49 nan nan 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.1 51 163.5 48.38 70.1 5,230.0 93.1 78.2
In [1121]:
df_who['Country0']=df_who['Country'].str.get(0)
df_who.head()
Out[1121]:
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale Country0
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.4 60 98.5 54.26 nan 1,140.0 nan nan A
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 nan 8,820.0 nan nan A
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 nan 8,310.0 98.2 96.4 A
3 Andorra Europe 78 15.2 22.86 nan 82 3.2 75.49 nan nan 78.4 79.4 A
4 Angola Africa 20821 47.58 3.84 6.1 51 163.5 48.38 70.1 5,230.0 93.1 78.2 A